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SPECIFICATION 



SIGNAL SEPARATION METHOD, SIGNAL SEPARATION DEVICE, 
5 SIGNAL SEPARATION PROGRAM AND RECORDING MEDIUM 



TECHNICAL FIELD 

[0001] The present invention relates to the field of signal processing, and 
in particular it relates to a signal separation method, a signal separation device 

10 and a signal separation program that can be used to estimate a source signal 
(target signal) in situations where the required target signal cannot be obtained 
by direct observation and the target signal is observed as a mixture with other 
signals, and to a recording medium that stores such a program. 
BACKGROUND ART 

15 [0002] Hitherto, blind source separation (BSS) has been known as a 
technique for separating and extracting the original source signals from a 
mixed signal consisting of a mixture of a plurality of source signals (e.g., 
audio signals), without using any prior knowledge about the source signals or 
the mixing process. FIG.27A shows a block diagram that illustrates the 

20 concept of this blind source separation technique. 

As this figure shows, a plurality of (in this case, N) signal sources 
701 emit source signals Si (i=l,...,N) which are mixed together and observed 
with a plurality of (in this case, M) sensors 702, and under these conditions 
the separated signals y k (k=l,...,N) estimated to correspond to the source 

25 signals are extracted from these observed signals Xj (j=l,...,M). Here, the 

process that takes place between the mixing of source signals S[ emitted from 
signals sources 701 and the observations of these signals by sensors 702 is 
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referred to as the "mixing process", and the process whereby the separated 
signals are extracted from the observations of sensors 702 is called the 
"separation process". 

[0003] To start with, the observed signals and the separation problem are 
5 formularized as follows. 

A MODEL OF MIXED SIGNALS (OBSERVED SIGNALS) IN REAL 
ENVIRONMENTS 

First, the mixing process is modeled as follows. 
Here, N is the number of signal sources 701, M is the number of 
1 0 sensors 702, Sj is the signal (source signal) emitted from the i-th signal source 
701 (signal source i), and hjj is the impulse response from signal source i to 
the j-th sensor 702 (sensor j). The signal Xj observed at sensor j is modeled by 
the convolutive mixitures of these source signals Si and impulse responses hji 
as follows: 
15 FORMULA 1 

x J (t) = i£h ji (p)s i (t-p + l) (1) 

i=lp=l 

Here, the term "convolution" means that the signals are added together after 
being delayed and being multiplied by specific coefficients in the signal 
propagation process. It is assumed that all the signals are sampled at a certain 

20 sampling frequency and represented by discrete values. In Formula (1), P 

represents the length of the impulse response, t represents the sampling time, 
and p represents a sweep variable ("sweep" being an operation whereby 
different coefficients are applied to each sample value of a time-shifted 
signal). The N signal sources 701 are assumed to be statistically mutually 

25 independent, and each signal is assumed to be sufficiently sparse. Here, 
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"sparse" means that the signal has a value of zero at most time t — sparsity is 
exhibited by speech signals, for example. 

[0004] The aim of BSS is to obtain separated signals y k from the 
observed signals Xj by estimating a separation system (W) 703 without any 
5 prior knowledge of the source signals Si or impulse responses hji. 

Since convolutional mixing problems are complicated to address 
and the assumption of sparsity holds better in the time-frequency domain, an 
effective way of addressing the above problem involves first applying a short- 
time discrete Fourier transform (DFT) to the abovementioned Formula (1) to 
10 transform the signal into the time-frequency domain. In the time-frequency 
domain, the abovementioned Formula (1) becomes 

X(f,m) = H(f)S(f,m) 

where f is the frequency, and m represents the timing of the DFT frames. H(f) 

is an (MxN) matrix whose ij element is the frequency response Hji(f) from 
15 signal source i to sensor j, and is referred to as the mixing matrix. Also, 

S(f,m)=[S 1 (f,m),...,S N (f,m)] T and X(f 5 m)=[X 1 (f,m),... 5 X M (f,m)] T are the DFT 

results obtained for the source signals and observed signals, respectively. 

Here, the notation [ot] T denotes the transposed matrix of a. Furthermore, 

S(f,m) and X(f,m) are vectors. 
20 [0005] Hereafter, explanations are given in the time-frequency domain. 

MODEL OF THE SEPARATION PROCESS 

The separation process is modeled as follows. 

First, let W(f,m) be an (NxM) matrix whose jk element is the 

frequency response Wj k (f,m) from the observed signal at sensor j to the 
25 separated signal y k . This matrix W(f,m) is called the separation matrix. Using 

the separation matrix, the separated signals can be obtained in the time- 
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frequency domain as follows: 

Y(f,m) = W(f,m)X(f,m) 

Here, Y(f,m)=[Y 1 (f,m) 5 ...,Y N (f > m)] T represents the separated signals in the 

time-frequency domain, and subjecting this to a short-time inverse discrete 
5 Fourier transform (IDFT) yields the separated signals y k — i.e., the results of 

estimating the source signals. Note that the separated signals y k are not 

necessarily ordered in the same way as the source signals Sj. That is, it is not 

necessarily the case that k=i. Also, Y(f,m) is a vector. 

[0006] ESTIMATING THE SEPARATION MATRIX W(F,M) 
10 In BSS, the separation matrix W(f,m) is estimated by using solely 

the observed signals. 

Known conventional methods for estimating the separated signals 

Y(f,m) include: (a) methods based on independent component analysis, (b) 

methods that utilize the sparsity of the signals, and (c) methods in which the 
15 mixing matrix is estimated based on the signal sparsity. These methods are 

discussed in turn below. 

CONVENTIONAL METHOD 1 : INDEPENDENT COMPONENT 
ANALYSIS 

Independent component analysis (ICA) is a technique in which 
20 signals that have been combined by linear mixing as in Formula (1) above are 
separated based on the statistical independence of the signals. FIG27B shows 
a block diagram of an ICA separation process for the case where N=M=2. In 
the time-frequency domain ICA, we perform successive learning with the 
learning rule W(f)=W(f)+AW(f) to find a separation matrix W(f,m) at each 
25 frequency so that each element of the output signal Y(f,m) becomes mutually 
independent. Here, the estimation unit 705 of the ICA separation matrix might 
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determine AW(f) by the following rule, for example: 
AW = jxp - <(|)(Y(f,m))Y(f 5 m) H >] 



Here, the notation [a] H denotes the conjugate transpose of a. Also, I 
represents a unit matrix, (...) represents time averaging, (J) represents a 
5 nonlinear function, and \x represents the update coefficient. Separation 

systems obtained by ICA are time-invariant linear systems. Various forms of 
the ICA algorithm have been introduced, including the one mentioned in Non- 
Patent Reference 1 . 

[0007] In ICA, since separation is performed by concentrating on the 

10 independence of the signals, the matrix Y'(f,m)=[Y 1 '(f > m),...,Y N '(f,m)] 

obtained from the relationship Y'(f,m)=W(f,m)X(f,m) using this separation 
matrix W(f,m) is indeterminate with respect to the ordering and scaling of the 
separated signals. This is because independence between the separated signals 
is preserved even when the ordering and scaling of the signals change. 

1 5 The process of resolving this indeterminacy of ordering is referred 

to as permutation resolution, and results in a separated signal Yi(f,m) where 
the separated signal components corresponding to the same source signal Si 
have the same subscript i at all frequencies. Methods for achieving this 
include a method in which the estimated arrival directions of signals obtained 

20 using the inverse matrix of the separation matrix (the Moore-Penrose pseudo- 
inverse matrix for cases where N^M) are verified, and the rows of the 
separation matrix W(f,m) are replaced so that the estimated arrival direction 
corresponding to the i-th separated signal becomes the same at each 
frequency, and a method in which the rows of the separation matrix W(f,m) 

25 are replaced so as to maximize the correlation between the absolute values 
| Yi(f,m)| of the i-th separated signal between different frequencies. In this 
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example, a permutation/scaling solving unit 706 resolves these permutations 
while feeding back the separated signals Yj(f,m). 

[0008] The process of resolving the indeterminacy of magnitude is 
referred to as scaling resolution. Permutation/scaling solving unit 706 
5 performs this scaling resolution by, for example, calculating the inverse 
matrix (the Moore-Penrose pseudo-inverse matrix for cases where N^M) 
.W^f^m) of the separation matrix W(f,m) obtained after permutation 
resolution, and then scaling each row Wi(f,m) of the separation matrix W(f,m) 
as follows: 

1 0 Wi(f,m) <- [Vr\f 9 m)lM(f,m) 

The separated signals at each frequency can then be obtained 
from Y(f,m)=W(f,m)X(f,m) by using the separation matrix W(f,m) in which 
the indeterminacy of ordering and magnitude have been resolved. 
[0009] With regard to the abovementioned learning rule, it is possible to 
15 use a function like 

<KY) = d>(|Y|) ■ expG ■ Z(Y)) 
<J>(x) = sign(x) 

as the nonlinear function in Formula (2). Also, as mentioned above, it is 
possible to use any permutation resolution method such as the signal arrival 
20 direction estimation method or the method that utilizes the similarity in the 
frequency components of the separated signals, or a combination of such 
methods, details of which can be found in Patent Reference 1 and Non-Patent 
Reference 2. Furthermore, a requirement of ICA is that the number of signal 
sources N and the number of sensors M obey the relationship M>N. 
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[0010] CONVENTIONAL METHOD 2: THE SPARSITY METHOD 

In cases where the number of signal sources N and the number of 
sensors M obey the relationship M<N, separation can be achieved by methods 
based on the signal sparsity (e.g., Non-Patent Reference 3). 
5 By assuming the signals to be sparse and mutually independent, 

even when a plurality of signals are present at the same time, it can be 
assumed that in the sample levels there is a low probability of observing 
overlapping signals at the same timing. That is, it can be assumed that there is 
no more than one signal contained in the observed signal at any one time. 

10 Accordingly, the signals can be separated by using a separation system 
W(f,m) consisting of a function that uses some method to estimate which 
signal source emitted the signal observed at each timing and only extracts 
signals at this timing (binary mask). This is the sparsity method. 
[0011] FIG28 (conventional method 2) shows a block diagram to 

15 illustrate this sparsity method. 

The following method is generally used to estimate the signal 
source at each timing. If each signal source is assumed to be spatially 
separate, then between the signals observed by the plurality of sensors there 
will exist phase shifts and amplitude ratios determined by the relative 

20 positions of the signal sources and sensors. From the assumption that there is 
at most one signal contained in the observed signal at each timing, the phase 
differences and amplitude ratios of the observed signal at this timing 
correspond to the phase and amplitude of the one signal contained in the 
observed signal at this timing. Accordingly, the phase differences and 

25 amplitude ratios of the observed signal in each sample can be subjected to a 
clustering process, and we can estimate each source signal by reconstituting 
the signals belonging to each cluster. 
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[0012] This is described in more specific detail as follows. First, 
observed signal relative value calculation unit 751 calculates the phase 
differences and/or amplitude ratios between the observed signals X(f,m) to 
obtain relative values z(f,m) as follows: 
5 FORMULA 2 

Phase difference z x (f, m) = Z X i( f ' m ) (i * j) 

Xj(f,m) 



IX^m)! 

Amplitude ratio z 2 (f, m) = j r (i j) 

IXj^m)! 



Alternatively, instead of using the phase difference itself, it is also possible to 
use the signal arrival directions derived from the phase differences as relative 
10 values z(f,m). 

[0013] Next, the distribution of the relative values z(f,m) is checked and 
clustered into N clusters by clustering unit 752. An example of such a 
distribution is shown in FIG.29. In this example, a mixed signal comprising 
three signals (N=3) is observed by sensor 1 Q=l) and sensor 2 (j = 2) — 
15 FIG.29A shows the distribution obtained using the phase difference or 

amplitude ratio alone, and FIG.29B shows the distribution obtained using both 
the phase difference and the amplitude ratio. As this figure shows, sparsity 
allows these distributions to be classified into N=3 clusters 801-803 or 81 1— 
813. 

20 [0014] Next, the representative values (peak, mean, median, etc.) of these 
N clusters are obtained in representative value calculation unit 753. In the 
following discussion, for the sake of convenience, these are numbered 
a 1 ,a 2v .. 3 a N in ascending order (in FIG29 they are numbered a !? a 2 and a 3 ). 

Next, in binary mask preparation unit 754, a binary mask M k (f,m) 
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is prepared as follows: 



FORMULA 3 



M k (f,m) = 



1 a k -8<z(f,m)<a k +e 
0 otherwise 



(k = l,...,N) 



(3) 



Here, 8 is a parameter that determines the width of the binary mask. Next, in 
5 signal extraction unit 755, the k-th separated signal is obtained by performing 
the calculation Y k (f,m)=M k (f,m)Xj(f,m), where j is an arbitrary sensor 



results on a nonlinear system with a time- varying separation matrix W(f,m): 



[0015] CONVENTIONAL METHOD 3 : ESTIMATING THE MIXING 
MATRIX BASED ON SPARSITY 

In this method, as a signal separation technique for cases where 
1 5 the number of signal sources N and the number of sensors M obey the 

relationship M=N, the sparsity of the signals is used to estimate the mixing 
matrix H(f), and the inverse matrix thereof is used to separate the signals (see, 
e.g., Non-Patent Reference 4 and Non-Patent Reference 5). 

FIG.28 (conventional method 3) shows a block diagram 
20 illustrating this method for estimating the mixing matrix based on sparsity. 

The mixed signal X(f,m) is expressed in terms of the mixing 
matrix H(f) as follows: 



number. 



That is, the method based on sparsity described in this example 
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W jk (f,m) = M k (f,m) for j e {1,...,M} 
W kl (f,m) = 0 for 1 ± j (1=1,...,M) 
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FORMULA 4 

~X,(f,m) 
X 2 (f,m) 

_X N (f,m) 

1 ••• 1 

= H 21 (f)/H M (f) ... H 2N (f)/H 1N (f) 

_H N1 (f)/H n (f) H NN (fyH 1N (f) 
= H(f)S(£m) ...(6) 

A 

Thus, if H(f ) can be estimated, then the separated signals Y(f,m) can be 
estimated from 

5 Y(f,m) = S(f,m)=H(f)- 1 X(f,m) ...(7) 

This procedure for obtaining the separated signals Y(f,m) from the estimated 

A 

H(f ) is described below. In the following, the notation a A is equivalent to 

A 

the notation a . 

[0016] First, signals at timings where only one signal is present are 
10 obtained by applying the same procedure as in [Conventional method 2] in 
observed signal relative value calculation unit 751, clustering unit 752, 
representative value calculation unit 753, binary mask preparation unit 754 
and signal extraction unit 755: 
FORMULA 5 



H„(f) 
H 21 (f) 



H 1N (f) 
H 2N (f) 



H N1 (f) ••• H™(f)J|S N (f,m) 



S,(f,m) 
S 2 (f,m) 



(4) 



H.^S.^m) 
H 12 (f)S 2 (f,m) 

H 1N (f)S N (f,m) 
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X(f , m) = M k (f , m)X(f , m) 
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Here, binary masks M k (f,m) are applied to the observed signals X(f,m) = 
[Xi(f,m),...,X M (f,m)] T of all the sensors. At this time, the timing mi at which 
only source signal Sj(f,m) is active, for example, can be expressed as follows: 
FORMULA 6 



X j (f , m ; ) = Mj (f , mj )Xj (f , m { ) « H }i (f )Sj (f , m, ) 



(8) 



The separated signals X A j(f > m i ) obtained in this way are sent to 
mixing process calculation unit 756, where H A (f) is estimated by performing 
the following calculation: 



Hji(f) = E 



= E 



•M k (f,m i )X j (f,m i )' 
MkCf.mOX^f.m,) 



Xj(f, mi ) 
Xi(f, mi ) 



= E 



[^(OSiCf.nii). 



= E 



H,i(f)J 



(9) 



10 where E[...] denotes averaging over mj. The matrix H A (f) obtained in this way 
is sent to inverse matrix calculation unit 757, where its inverse matrix H^f)"" 1 
is obtained. Then, in signal separation unit 758, the calculation shown in 
Formula (7) above provides estimate the separated signals Y(f,m). 

Note that since this procedure uses the inverse matrix of H A (f), it 

1 5 can only be applied in cases where the number of signal sources N and the 
number of sensors M obey the relationship M=N. 

[Patent Reference 1] Japanese Unexamined Patent Publication No. 2004- 
145172 

[Non-Patent Reference 1] A. Hyvaerinen, J. Karhunen and E. Oja, 
20 "Independent Component Analysis," John Wiley & Sons, 200 1 , ISBN 0-47 1 - 



12 

40540 

[Non-Patent Reference 2] H. Sawada, R. Mukai, S. Araki and S. Makino, "A 
Robust and Precise Method for Solving the Permutation Problem of 
Frequency-Domain Blind Source Separation," , in Proc. the 4th International 
5 Symposium on Independent Component Analysis and Blind Signal Separation 
(ICA2003), 2003, pp. 505-510 

[Non-Patent Reference 3] S. Rickard, R. Balan, and J. Rosea, "Real-Time 
Time-Frequency Based Blind Source Separation," 3rd International 
Conference on Independent Component Analysis and Blind Source 

10 Separation (ICA2001), San Diego, December, 2001, pp. 651-656 

[Non-Patent Reference 4] F. Abrard, Y. Deville, P. White, "From blind source 
separation to blind source cancellation in the underdetermined case: a new 
approach based on time-frequency analysis," Proceedings of the 3rd 
International Conference on Independent Component Analysis and Signal 

15 Separation (ICA'2001), pp. 734-739, San Diego, California, Dec. 2001 
[Non-Patent Reference 5] Y. Deville, "Temporal and time-frequency 
correlation-based blind source separation methods," in Proc, ICASSP2003, 
Apr. 2003, pp. 1059-1064 
DISCLOSURE OF THE INVENTION 

20 PROBLEM ADDRESSED BY THE INVENTION 

[0017] In conventional signal separation methods, when the number of 
signal sources N and the number of sensors M obey the relationship N>M, it 
has been difficult to achieve high-quality separation of the mixed signals. 

Specifically, as mentioned above, when the number of signal 

25 sources N and the number of sensors M obey the relationship N>M, it is not 
possible to use methods based on independent component analysis or methods 
in which the mixing matrix is estimated based on sparsity. 
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Also, although it is possible to use methods that exploit the 
sparsity of signals, with these methods it is difficult to achieve signal 
separation with good separation performance and low distortion. Specifically, 
when creating a binary mask as shown in Formula (3) above, it is possible to 
5 achieve good separation performance if 8 is made small enough, but on the 
other hand this increases the number of samples eliminated by this binary 
mask and degrades the separated signal. In other words, when the signals are 
completely sparse so that the observed signal contains at most one signal at 
each timing, then the relative values z(f,m) at each timing should converge 

10 around the vicinity of one of the representative values ai,...,a N . However, 
since real signals are not completely sparse, there will also be cases where 
two or more observed signals are present at the same timing and frequency. In 
such cases, the relative values z(f,m) at this timing will be receded from the 
representative values ai,...,a N that would otherwise be expected, thus yielding 

15 a value of s that causes the signals to be excluded by binary masking. As a 
result, the observed signal corresponding to this sample is treated as being 
zero, and a zero component is padded into the separated signal. Since the 
proportion of samples excluded in this way increases as the value of 8 gets 
smaller, the amount of samples padded with a zero component also increases 

20 as 8 gets smaller. When there are many zero components padded into each 

separated signal, this causes the distortion of the separated signals to increase, 
resulting in the generation of a perceptually uncomfortable type of noise 
called "musical noise". On the other hand, if the value of 8 used for binary 
masking is made large, then fewer zero components are padded into the 

25 separated signals and the occurrence of musical noise decreases, but instead 
the separation performance deteriorates. 

[0018] The present invention has been made in the light of such 
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problems, and aims to provide a technique that can perform high-quality 
separation of mixed signals even in cases where the relationship between the 
number of signal sources N and the number of sensors M is such that N>M. 
MEANS OF SOLVING THE PROBLEM 
5 [0019] In the first present invention, the abovementioned problem is 
solved as follows: 

First, the values of the observed signal — which is a mixture of N 
(N>2) signals observed by M sensors — are transformed into frequency 
domain values, and these frequency domain values are used to calculate the 

1 0 relative values of the observed values between the sensors (including the 
mapping of relative values) at each frequency. These relative values are 
clustered into N clusters, and the representative value of each cluster is 
calculated. Then, using these representative values, a mask is produced to 
extract the values of the signals emitted by V (V<M) signal sources from the 

1 5 frequency domain values, and this mask is used to extract the values of a 
limited signal comprising the signals emitted from these V signal sources. 
When V>2, this limited signal is a mixed signal comprising the signals 
emitted by V signal sources, so this limited signal is further separated to yield 
each of the separated signal values. On the other hand, when V=l, the values 

20 of this limited signal are regarded as the separated signal values. 

[0020] To separate limited signals consisting of signals emitted from V 
signal sources extracted in this way, it is possible to employ methods such as 
an independent component analysis method or a method in which the mixing 
matrix is estimated based on sparsity, for example. Consequently, it is 

25 possible to extract source signals with high quality even in cases where N>M. 
However, with this approach alone it is only possible to extract V source 
signals. Therefore, all the source signals are extracted by, for example, 
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repeating the same processing while using a plurality of different types of 
mask to change the combinations of extracted signals. 

In the second present invention, the abovementioned problem is 
solved as follows: 

5 [0021] First of all, the observed signal values X!(t),...,x M (t) are 

transformed into frequency domain values Xi(f,m),...,X M (f,m). First vectors 
X(f 5 m)=[X 1 (f,m),...,X M (f ) m)] consisting of the frequency domain values 
Xi(f,m),...,X M (f,m) are then clustered into N clusters Q(f) (i=l V ..,N) for each 
frequency f, a second vector ai(f) representative of each cluster Q(F) is 
10 extracted, and V (V<M) third vectors a p (f) (p=l,...,V) are extracted therefrom. 
A mask M(f,m) is then produced according to the following formula, where 
G k is the set of third vectors a p (f), G k ° is the complementary set of G k , and the 
notation D(a,p) represents the Mahanalobis square distance of vectors a and 

P: 

15 FORMULA 7 



M(f,m) = 



\ max ap(f)eGk D(X(f,m),a p (f)) < min^. D(X(f,m),a q (f)) 
0 otherwise 



and the products of mask M(f,m) with the first vectors X(f,m) are calculated 
to extract the values of the limited signal consisting of the signals emitted 
from V signal sources. 

20 [0022] To separate limited signals consisting of signals emitted from V 
signal sources extracted in this way, it is possible to employ a method such as 
independent component analysis or a method in which the mixing matrix is 
estimated based on sparsity, for example. Consequently, it is possible to 
extract source signals with high quality even in cases where N>M. However, 

25 with this approach alone it is only possible to extract V source signals. 
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Therefore, for example, the same processing is repeated while using a 
plurality of different types of mask on a plurality of different types of set G k 
to change the combinations of extracted signals. In this way, all the source 
signals are extracted. 
5 [0023] In the third present invention, the abovementioned problem is 
solved as follows: 

First, the observed signal values Xi(t),...,x M (t) are transformed into 
frequency domain values Xi(f,m),...,X M (f,m), and first vectors 
X(f,m)=[Xi(f,m),...,X M (f,m)] T consisting of these values are clustered into N 

10 clusters Q(f) (i=l,...,N) for each frequency f. If the source signals are sparse, 
then even in situations where there is an insufficient number of sensors 
(N>M), it is still possible to cluster these vectors into N clusters and to 
calculate representative vectors aj(f) for each of these N clusters. 
[0024] Second vectors ai(f) are then calculated to represent each of these 

1 5 clusters Q(f), and an N-row x M-column separation matrix W(f,m) is 

calculated as the Moore-Penrose pseudo-inverse matrix of an M-row x N- 
column matrix A' in which 0 or more of the N said second vectors aj(f) are 
substituted with zero vectors (this matrix is denoted by A' + (f), and is identical 
to the inverse matrix A'" 1 when N=M). The separation matrix W(f,m) 

20 generated here is a matrix that depends on time m in cases where the number 
of sensors in insufficient (N>M), and is independent of time m in cases where 
the number of sensors is sufficient (N<M). 

[0025] After that, the calculation Y(f,m)=W(f,m)X(f,m) is performed to 
calculate a separated signal vector Y(f,m)=[Yi(f,m),...,Y N (f,m)] , which is 
25 transformed into time-domain signal values yi(t),...,yN(t). 

Here, due to the sparsity of the source signals, even if the number 
of signal sources N is greater than the number of sensors M (N>M), it is still 
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highly probable that there are no more than M signal sources having values 
that affect the observation result in any given discrete time interval m. 
Consequently, for each discrete time interval m, the separation matrix W(f,m) 
generated as described above is able to separate these M or fewer signals. 
5 Then, in cases where N>M, since the separation matrix W(f,m) is time- 
dependent, the resulting combinations of separated signals are liable to differ 
between different discrete time intervals. Consequently, by obtaining 
separated signals for a plurality of discrete time intervals m, it is possible to 
obtain all the separated signals. 
1 0 ADVANTAGES OF THE INVENTION 

[0026] As described above, with the present invention it is possible to 
perform high-quality separation of mixed signals even when the relationship 
between the number of signal sources N and the number of sensors M is such 
that N>M. 

1 5 BRIEF DESCRIPTION OF THE FIGURES 

[0027] [FIG 1] A block diagram showing an example of the overall 
configuration of a signal separation device according to a first embodiment. 

[FIG.2] A block diagram showing examples of the detailed 
configuration of the representative value generation unit, mask control unit, 
20 limited signal generation unit and limited signal separation unit in FIG 1 . 

[FIG 3] A block diagram showing an example of the detailed 
configuration of the mask generation unit in FIG1 and FIG2. 

[FIG4] A flowchart illustrating the processing performed by a 
signal separation device according to the first embodiment. 
25 [FIG5] An example of a histogram produced by the clustering 

unit. 

[FIG6] A figure illustrating the method used to define the 
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estimated arrival direction 0j of signals used when generating a mask with a 
smooth profile in the first embodiment. 

[FIG 7] An example of a mask in the first embodiment. 

[FIG 8] A block diagram showing an example of one system of a 
5 signal separation device according to a second embodiment. 

[FIG 9] A block diagram showing an example of a single system 
of a signal separation device according to a third embodiment. 

[FIG. 1 0] An example of a mask in the third embodiment. 

[FIG 1 1] A block diagram showing an example of the 
1 0 configuration of a mask generation unit according to a fourth embodiment. 

[FIG. 12] An example of a binary mask in a sixth embodiment (A), 
and an example of a binary mask in a seventh embodiment (B). 

[FIG. 13] A block diagram showing examples of the configuration 
of the representative value generation unit, mask control unit and limited 
1 5 signal generation unit according to an eighth embodiment. 

[FIG 1 4] A flowchart illustrating the signal separation process in 
the eighth embodiment. 

[FIG 15] A block diagram showing an example of the 
configuration of a signal separation device according to a ninth embodiment. 
20 [FIG. 1 6] A flowchart illustrating the processing performed by a 

signal separation device according to the ninth embodiment. 

[FIG 1 7] A flowchart illustrating the separation matrix generation 
process performed when there is an insufficient number of sensors (M<N). 

[FIG. 1 8] A plot of the observed signal vectors X(f,m) from a 
25 single audio source before normalization. 

[FIG 19] A plot of the observed signal vectors X(f,m) from a 
single audio source normalized by Formula (36). 
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[FIG20] A plot of the observed signal vectors X(f,m) from a 
single audio source normalized by Formula (37). 

[FIG.21] A plot of the observed signal vectors X(f,m) from two 
audio sources before normalization. 

[FIG22] A plot of the observed signal vectors X(f,m) from two 
audio sources normalized by Formula (36). 

[FIG23] A plot of the observed signal vectors X(f,m) from two 
audio sources normalized by Formula (37), 

[FIG. 24] A flowchart illustrating a separation matrix generation 
process that can be applied regardless of whether or not there is a sufficient 
number of sensors with respect to the number of signal sources. 

[FIG.25] A partial block diagram showing an example of the 
configuration employed for transformation into the time domain after 
performing signal combination in the frequency domain. 

[FIG26] An example of a signal separation device wherein each 
embodiment is configured by a computer. 

[FIG.27] A block diagram showing a conceptual example of a 
conventional blind source separation technique (A), and a block diagram of an 
ICA separation process (B). 

[FIG.28] A block diagram illustrating a method based on sparsity 
and a method for estimating a mixing matrix based on sparsity. 

[FIG29] An example of the distribution of relative values. 
LIST OF REFERENCE NUMERALS 
[0028] 1, 500: Signal separation device 

2,501: Memory unit 

3, 502: Signal separation processor 
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BEST MODES FOR IMPLEMENTING THE INVENTION 
[0029] Embodiments of the present invention are described below with 
reference to the figures. 
FIRST EMBODIMENT 
5 This embodiment is an embodiment relating to the first present 

invention; in this example, the values of a mixed signal comprising the signals 
emitted from V (2<V<M) sources (referred to as a "limited signal" in this 
embodiment) are extracted from the observed signal values by using a mask 
with a smooth profile that uses the directional characteristics of a null 

10 beamformer, and ICA is used to perform signal separation on the extracted 
limited signal values. 

FIG 1 shows a block diagram of an example of the overall 
configuration of a signal separation device 1 according to this embodiment. 
FIG.2 shows block diagrams of examples of the detailed configuration of 

15 representative value generation unit 30, mask control unit 40, limited signal 
generation unit 50-k (k=l,...,u; where u is the number of systems as described 
below), and limited signal separation unit 60-k. FIG.3 shows a block diagram 
of an example of the detailed configuration of mask generation unit 5 1-k in 
FIG.1 and FIG.2. The arrows in these figures indicate the flow of data, but the 

20 flow of data into and out from control unit 10 and temporary memory unit 90 
is not shown. Specifically, even when data passes through control unit 10 or 
temporary memory unit 90, the associated process is not shown. FIG.4 shows 
a flowchart illustrating the processing performed by signal separation device 1 
in this embodiment. In the following, these figures are used to describe the 

25 configuration of signal separation device 1 in this example, and the processing 
performed by this device. 
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[0030] OVERALL CONFIGURATION 

First, the overall configuration of the signal separation device of 
this embodiment is described. 

As FIG 1 shows, the signal separation device 1 of this 
5 embodiment includes a memory unit 2 and a signal separation processor 3 that 
is electrically connected thereto by a hard- wired or wireless connection. 

Memory unit 2 might be, for example, a hard disk device, a 
magnetic recording device such as a flexible disk or magnetic tape device, an 
optical disk device such as a DVD-RAM (random access memory) or CD-R 
10 (recordable)/RW (rewritable) device, a magneto-optical recording device such 
as an MO (magneto-optical) disc device, or a semiconductor memory such as 
an EEP-ROM (electronically erasable programmable read-only memory) or 
flash memory. Memory unit 2 may be situated inside the same enclosure as 
signal separation processor 3, or it may be housed separately. 
15 [0031] This signal separation processor 3 consists of hardware configured 
from elements such as a processor and RAM, for example, and incorporates 
the processing blocks described below. 
SUMMARY OF THE SIGNAL SEPARATION PROCESS 

Next, the signal separation processing performed by signal 
20 separation device 1 is summarized. 

In this embodiment, it is assumed that the signals emitted from 
the N signal sources are statistically independent of each other, and that each 
signal is sufficiently sparse. Here, "sparse" refers to the property of a signal 
that is zero or close to zero at almost all times t, and rarely takes a large value. 
25 This sort of sparsity is found to occur in speech signals, for example. Note 
that when speech signals and other signals that do not consist of white noise 
are converted into time-series data at different frequencies by performing a 
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transformation such as a short time discrete Fourier transform, the proportion 
of timings at which the signals are close to zero becomes even larger, thereby 
accentuating their sparsity. Also, although Gaussian distributions are generally 
used to model signals, sparse signals are modeled not with Gaussian 
5 distributions but with other forms of distribution such as a Laplace 
distribution. 

[0032] First, the M observed signal values Xj(t) are converted into 
frequency domain observed signal values Xj(f,m) by a frequency domain 
transformation unit 20, and then a representative value generation unit 30 
10 calculates N representative values ai,a 2 ,..-,a N corresponding to each source 
signal. 

Next, V (2<V<M) of the representative values a 1? a 2 ,...,a N are 
suitably selected by mask control unit 40, and in limited signal generation unit 
50-k, the values X A (f,m) of a limited signal consisting only of V source 

1 5 signals are estimated from the observer signal values Xj(f,m). Note that when 
V=l, the method described under [Third embodiment] below is used. Here, a 
mask with a smooth profile is produced in mask generation unit 5 1-k to 
extract V signals, and in limited signal extraction unit 52-k this mask is 
applied to the observed signal values Xj(f,m) to estimate the limited signal 

20 values X A (f,m). 

[0033] Next, in limited signal separation unit 60-k, a separation system is 
estimated for obtaining V separated signals. Here, M limited signal values 
X A (f,m) are provided as inputs, and V separated signal values Y(f,m) are 
obtained as outputs. With regard to the number of inputs M and outputs V in 

25 the separation system, since V<M it is possible to use [Conventional method 
1] or [Conventional method 3] to perform the estimation in this separation 
system. 
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[0034] Finally, in time domain transformation unit 70-k, the separated 
signal values Y(f,m) obtained in the time-frequency domain are transformed 
into time-domain signal values. 

However, with the above processing alone, it is only possible to 
5 obtain V separated signals. Therefore, to obtain the other separated signals, 
the configuration of the V representative values selected in mask control unit 
40 is changed, and the processing from limited signal generation unit 50-k to 
time domain transformation unit 70-k is performed in multiple systems (u 
systems). 

10 Finally, the outputs from each system are combined in signal 

combination unit 80 to yield all N separated signals. 
[0035] DETAILED CONFIGURATION AND PROCESSING 

Next, the configuration and processing of this example are 
described in detail. 

1 5 This example relates to a device that separates and extracts source 

signals from observed signals in situations where the signals emitted from N 
(N>2) signal sources are mixed together and observed by M sensors. Note 
that, as mentioned above, the signals in this example are signals that can be 
assumed to be sparse, such as speech signals, and the number of audio sources 

20 N is either known or can be estimated. Also, in this example it is assumed that 
the sensors are microphones or the like that are capable of observing these 
signals and are arranged on a straight line. 

[0036] First, as a preliminary process, the time-domain observed signals 
Xj(t) (j=l,...,M) observed by each sensor are stored in memory unit 2. Then, 
25 when the signal separation process is started, signal separation processor 3 
performs the following processing under the control of control unit 10. 

First, signal separation processor 3 accesses memory unit 2, from 
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where it sequentially reads in the observed signal values xj(t), which it sends 
to frequency domain transformation unit 20 (Step SI). Frequency domain 
transformation unit 20 uses a transformation such as a short time discrete 
Fourier transform to transform these signal values into a series of frequency- 
5 domain observed signal values Xj(f,m) for each time interval, which it stores 
in temporary memory unit 90 (Step S2). The frequency-domain observed 
signal values Xj(f,m) stored in temporary memory unit 90 are sent to a 
representative value generation unit 30, and the relative value calculation unit 
3 1 of representative value generation unit 30 uses these frequency-domain 

10 observed signal values Xj(f,m) to calculate the relative values z(f,m) of the 
observed values between each sensor at each frequency (Step S3). 
[0037] The relative values z(f,m) may be obtained by using one or more 
parameters such as the phase difference or amplitude ratio, which are 
expressed as follows: 

15 FORMULA 8 

X (f m^ 

Phase difference z, (f, m) = Z — 1 — 1 (i ^ j) 

Xj(f,m) 

IX^m)! 

Amplitude ratio z 2 (f, m) = ? (i ^ j) 



|Xj(f,m)| 



Alternatively, instead of using the phase difference itself, it is also possible to 
use a mapping thereof (e.g., the signal arrival directions derived from the 
20 phase differences). 

In this example, these relative values z(f,m) are based on the 
arrival directions of the signals obtained from the phase differences Z](f,m) 
between the observed signals from any two sensors (sensors jl and j2) as 
follows: 
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FORMULA 9 



Zj(f,m)v 

z 3 (f,m)=cos 
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and these values of z 3 (f,m) are calculated by relative value calculation unit 3 1 . 
Here, v is the velocity of the signal, and d is the spacing between sensor j 1 
5 and sensor j2. 

The relative values z 3 (f,m) calculated in this way are stored in 
temporary memory unit 90. Next, clustering unit 32 sequentially reads out the 
relative values z 3 (f,m) from temporary memory unit 90, and clusters these 
relative values z 3 (f,m) into N clusters (Step S4). In this example, clustering 

10 unit 32 produces a histogram of the relative values z 3 (f,m) sent to it. 

[0038] FIGS shows an example of a histogram produced in this way. 
Note that this example corresponds to a situation where the number of source 
signals is N=3. 

As illustrated in this figure, this histogram consists of a 

15 distribution with N (=3) peaks. In this example, clustering unit 32 clusters this 
distribution into N (=3) clusters (clusters 91—93 in this example). This could, 
for example, be performed by clustering based on a suitable threshold value, 
or by using methods described in many textbooks such as the k-means method 
or hierarchical clustering — see, e.g., Morio Onoe (trans.): "Pattern 

20 Classification," Shingijutsu Communications, ISBN 4-915851-24-9, chapter 
10. Here, each of the resulting clusters Q (i=l,2,...,N) is a set of relative 
values z 3 (f,m), and can be expressed as Q(f)={z 3 (f,m) | meTj} using the set Ti 
of discrete time intervals. 

[0039] The clustering information (clusters Ci,C 2 ,...,C N ) generated by 
25 clustering unit 32 are stored in temporary memory unit 90. Representative 
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value calculation unit 33 reads in this information and calculates the 
representative values ai,a 2 ,...,a N of each of these N clusters Ci,C 2 ,...,C N (Step 
S5). Specifically, it might obtain the representative values from the peak of 
each cluster in the histogram, or it might obtain the representative values from 
5 the mean of each cluster. For the sake of convenience, it is assumed that these 
N representative values are then, for example, arranged in ascending order 
a l5 a 2 ,...,a N (see FIG.5). Note that these representative values a!,a 2 ,...,a N are the. 
estimated values of the arrival directions of each of the N signals. 
[0040] In this example, the information on representative values 

10 aj,a 2 ,...,a N is sent to mask control unit 40 after being stored in temporary 

memory unit 90. In mask control unit 40, the data specifying a set G 0 whose 
elements are these representative values ai,a 2 ,...,a N is substituted into a 
variable SG 0 , and this variable SG 0 is stored in temporary memory unit 90. 
Mask control unit 40 also initializes a variable SG specifying a set G to G=0 

1 5 (the empty set), and a variable k is set to zero; these are stored in temporary . 
memory unit 90 (Step S6). 

Next, under the control of mask control unit 40, processing is 
performed in a plurality of systems (u systems) in limited signal generation 
unit 50-k (k=l,...,u), limited signal separation unit 60-k and time domain 

20 transformation unit 70-k until all N separated signals have been obtained. 

[0041] First, mask control unit 40 adds 1 to the value of variable k stored 
in temporary memory unit 90 to obtain a new value for variable k which is 
stored back in temporary memory unit 90 (Step S7). Next, mask control unit 
40 retrieves the variables SG 0 and SG from temporary memory unit 90. Then, 

25 in mask control unit 40, a set G k is selected consisting of V (<M) suitable 

representative values including the members of the complementary set G c of 
the set G specified by SG (the notation a c represents the complementary set of 
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a), the data specifying this set G k is assigned to variable SG k , and this 
variable SG k is stored in temporary memory unit 90 (Step S8). 
[0042] The mask generation unit 5 1-k of limited signal generation unit 
50-k reads out the variable SG k stored in temporary memory unit 90, and 
5 produces a "smooth-profile mask" that extracts signals in the clusters whose 
representative values are in the set G k specified by this variable SG k (Step 
S9). Here, a "smooth-profile mask" means a function that takes high level 
values with respect to relative values in a prescribed range including V 
(2<V<M) representative values, and takes low level values with respect to 

1 0 representative values that are not inside this limited range, and where the 

transitions from the high level to the low level that accompany changes of the 
relative value occur in a continuous fashion. Note that in this example a "high 
level value" means a numerical value that is sufficiently greater then zero 
(e.g., 1 or more), and a "low level value" means a value that is sufficiently 

15 close to zero (e.g., at least 60 dB lower than the high level value), although no 
particular restrictions are placed on these values. 

[0043] In this embodiment, a "smooth profile mask" is produced using 
the directional characteristics of a null beamformer formed by N-V+l 
sensors. This mask is a mask with a smooth profile that is sufficiently 

20 sensitive in the direction (G k ) of the V signals included in the limited signal, 
and has a low sensitivity characteristic (a null) in the direction (G 0 r\ G k c ) of 
the N— V signals to be eliminated. 

The procedure for creating the "smooth profile mask" of this 
embodiment is described below. 

25 First, mask generation unit 5 1-k reads out the variables SG k , SGo 

and SG k c from temporary memory unit 90. Mask generation unit 5 1-k then 
extracts any one of the elements (a representative value within the limited 
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range) of set G k representing variable SG k ; this element is referred to as Q\. 
Mask generation unit 5 1-k also extracts all the elements G 0 nG k c (the 
representative values not inside the limited range) determined by variables 
SG 0 and SG k c , and these elements are referred to as 0; (i=2,...,N— V+l). Mask 
5 generation unit 5 1-k then stores 0i and 0i in temporary memory unit 90. Next, 
mask generation unit 5 1-k extracts 0i and 0i from temporary memory 90, and 
calculates Tji=(dj/v)cos0i (j=l,...,N-V+l). Mask generation unit 5 1-k also 
calculates the elements ji of a delay matrix H NB F(f) from the formula 
H NB Fji(f) =ex P0 27 rf T ji) and stores them in temporary memory unit 90. In these 

10 formulae, dj is the distance between sensor 1 and sensor j (djK)), f is a 

frequency variable, and v is the signal velocity. These parameters could, for 
example, be pre-stored in temporary memory unit 90 and sequentially read 
out for use. The above process results in the generation of an 
((N-V+l)x(N-V+l)) delay matrix H NBF (f) (FIG.3: 51a-k). 

1 5 [0044] In this embodiment, since the relative values are taken to be the 
arrival directions z 3 (f,m) of the signals obtained from the phase difference 
Zi(f,m) between the signals observed by two sensors, the abovementioned 0i 
represents the arrival direction of a signal corresponding to a representative 
value inside the limited range, and 0j represents the arrival direction of a 

20 signal corresponding to a representative value outside the limited range. 

These values of 0j (i=l,2,...,N-V+l) are defined as shown in FIG.6. First, an 
origin is set in the middle of M sensors arranged on a straight line (where Li 
is the distance from the first sensor to the origin, and L 2 is the distance from 
the origin to the M-th sensor and L]=L 2 ). The angle subtended between the 

25 line connecting this origin to the i-th signal source and the line connecting the 
origin to the first sensor 10 is the angle 0j corresponding to the i-th signal 
source. 
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source. 

[0045] The resulting delay matrix H NBF (f) is sent from temporary 
memory unit 90 (FIG 1) to NBF generation unit 5 lb-k (FIG3), and NBF 
generation unit 5 lb-k uses this delay matrix HnbfCO to generate an NBF 
5 matrix W(f) having null beamformer (NBF) characteristics. This is obtained 
by calculating the inverse matrix W(f) of the delay matrix H NB F(f) using the 

formula W(f)=H NB F" 1 (f)- 

This NBF matrix W(f) is stored in temporary memory unit 90 
(FIG1). A directional characteristics calculation unit 51c-k extracts the first 
10 row elements Wi k (f) of this NBF matrix W(f) together with the values of d k 
and v from temporary memory unit 90, and generates the following 
directional characteristics function for the case where 0 is a variable 
expressing the arrival direction of the signal: 
FORMULA 10 

N-V+l 

15 F(f,9)= X W lk (f)exp(j27ifd k cos9/v) (10) 

k=i 

where 9 is defined in the same way as 9j as described above. 
[0046] The resulting directional characteristics function F(f,9) is sent to 
mask configuration unit 5 ld-k. Mask configuration unit 5 ld-k uses this 
directional characteristics function F(f,9) and the relative values z(f,m) (in 
20 this example, z 3 (f,m)) read out from temporary memory unit 90 to generate a 
mask M D c(f>m) with a smooth profile. 

[0047] This mask M DC (f,m) could, for example, be generated by using the 
directional characteristics F(f,9) directly as follows: 

[Mask 1] M DC (f,m) = F(f,z 3 (f,m)) (11) 
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Or alternatively, the mask M DC (f,m) could be generated by using the absolute 
values of the directional characteristics F(f,9) as follows: 

[Mask 2] M DC (f,m) = |F(f,z 3 (f,m))| (12) 

FIG 7 A shows an example of [Mask 2] (for the case where the 
5 number of signals is N=3 and the number of sensors is M=2). The "smooth 
profile mask" of this example is one that eliminates N-M=l signal, and has a 
small gain in one direction ai. Note that the purpose of this "smooth profile 
mask" is to extract M(=V)=2 signals (in this case, the two signals arriving 
from directions a 2 and a 3 ) as limited signals (the same applies to FIG7B and 

10 FIG7C below). 

[0048] Or as another example, the mask M DC (f,m) could be generated by 
transforming the directional characteristics F(f,0) as follows. Note that in the 
following, all the region of the relative values z 3 (f,m) between two 
neighboring values of aj in the elements of G k are referred to as a limited 

15 signal region. When aj or a N is included in G k , the region 0°<z 3 (f,m)<ai or 
180°>z 3 (f,m)>a N is also included in the limited signal region. Furthermore, 
the regions of relative values z 3 (f,m) between two neighboring values of 2i\ in 
the elements of GooG k c are all referred to as elimination signal regions. When 
ai or a N is included in G 0 r^Gk C , the region 0°<z 3 (f,m)<ai or 180°>z 3 (f,m)>a N is 

20 also included in the elimination signal region. Regions that do not belong to 
either the limited signal region or the elimination signal region are referred to 
as transitional regions. 
[0049] FORMULA 11 
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[Mask 3] 
M DC (f,m)=-\ 



/c w rc \ Region outside the 
F(f,2,(f,m)) z 3 (f,m)e elimination signal region 



F(f,e r ) 



z 3 (f,m) e Elimination signal region 

(13) 



[Mask 4] 

I *c(f (f w\ (f \ Region outside the 
5 M DC (f,m)=j' 3 1 Z3C ' m;e elimination signal region 

| F(f,0 r ) | z 3 (f,m) e Elimination signal region 

(14) 

These mask functions M DC (f,m) have masking properties that 
uniformly reduce the gain in the elimination signal region. Here, 0 r represents 
the end of the elimination signal region that is closest to the end of the 
10 neighboring limited signal region. FIG.7B shows an example of this [Mask 4] 
(for the case where the number of signals is N=3 and the number of sensors is 
M=2). 

[0050] It is also possible to use a mask M DC (f,m) with uniform directional 
characteristics in the limited signal region, for example: 
15 FORMULA 12 

[Mask 5] 

a z 3 (f, m) e Limited signal region 

b z 3 (f, m) e Elimination signal region (15) 

F(f, z 3 (f, m)) z 3 (f, m) e Transitional region 



M DC (f,m) = 



Furthermore, it is possible to use the absolute value of a mask with uniform 
directional characteristics in the limited signal region, such as: 



20 



M DC (f,m) = 
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[Mask 6] 

a z 3 (f, m) g Limited signal region 

b z 3 (f, m) e Elimination signal region 

| F(f,z 3 (f,m)) | z 3 (f,m) e Transitional region 

(16) 

Here, a is set to a value sufficiently greater than zero, such as the 
5 maximum value of |F(f,9)| in the elimination signal region, for example, and b 
is set to a small value such as the minimum value of the gain of the directional 
characteristics, for example. FIG7C shows an example of [Mask 6] (for the 
case where the number of signals is N=3 and the number of sensors is M=2). 
(This ends the description of mask generation unit 51-k/Step S9.) 

10 [0051] The mask M DC (f>m) generated by mask generation unit 5 1-k in 
this way is stored in temporary memory unit 90 and is then sent to limited 
signal extraction unit 52-k. Limited signal extraction unit 52-k also reads out 
the frequency domain observed signal values X(f,m) from temporary memory 
unit 90. Then, limited signal extraction unit 52-k (FIG2) uses this mask 

1 5 M DC (f,m) and the frequency domain observed signal values X(f,m) to 
generate the limited signal values X k A (f,m) by calculating the product 
X k A (f,m)=M DC (f,m)X(f,m) (Step S10). 

[0052] These limited signal values X k A (f,m) are stored in temporary 
memory unit 90, and limited signal separation unit 60-k reads out these 

20 limited signal values X k A (f,m) and performs signal separation on the limited 
signals (Step SI 1). Here, an approximation is made by assuming that the 
limited signal values X k A (f,m)=M DC (f>m)X(f,m) are similar to the values of 
the mixed signal consisting of the signals emitted from V (2<V<M) signal 
sources. Therefore, to estimate this separation matrix it is possible to use a 

25 method based on independent component analysis as discussed in 
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[Conventional method 1]. Specifically, separation is performed using Formula 
(2) mentioned in [Conventional method 1], for example, using the limited 
signal values X k A (f,m) as the input values for independent component analysis 
instead of the observed signal values X. 
5 [0053] To perform the ICA separation in this embodiment, the limited 
signal values X k A (f,m) are first used to generate a separation matrix W(f,m) in 
ICA separation matrix estimation unit 61-k according to the abovementioned 
learning rule of Formula (2), and this separation matrix W(f,m) is stored in 
temporary memory unit 90. This separation matrix W(f,m) is for example 

1 0 generated using feedback from the output values Y k (f,m) from 

permutation/scaling resolution unit 62-k, which is described below. The 
resulting separation matrix W(f,m) is sent to permutation/scaling resolution 
unit 62-k. Permutation/scaling resolution unit 62-k uses this separation matrix 
W(f,m) and the limited signal values X k (f,m) to generate the respective 

1 5 separated signal values Y k (f,m)=[Y kl nkl (f,m),...,Y kv nkV (f,m)] T by performing 
the calculation Y k (f,m)=W(f,m)X k A (f,m), and stores them in temporary 
memory unit 90. Permutation/scaling resolution unit 62-k then, for example, 
feeds back these separated signal values Y k (f,m) to resolve the permutation 
problem with the method mentioned in [Conventional method 1]. After 

20 resolving the permutation problem, permutation/scaling resolution unit 62-k 
then applies tags n kq to the separated signal values Y kq (q=l,...,V) to show 
which source signal the separated signal values Y kq (q=l,...,V) correspond to, 
and these are stored together in temporary memory unit 90. Here, these tags 
n kq are represented by adding the superscript Ilkq to the separated signal 

25 values Y kq . 

[0054]. For example permutation/scaling resolution unit 62-k might 
compare the estimated arrival direction 9 q of the signal, which is obtained 
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using the inverse matrix of separation matrix W(f) extracted from temporary 
memory unit 90 (or the Moore-Penrose pseudo-inverse matrix when N^M) by 
the following formula: 
FORMULA 13 



0 n = arccos — : — — — (17) 

q 27rfV- 1 d 



[0055] (where v is the signal velocity and d is the distance between 
sensor j and sensor j') 

with the representative value included in set G k indicating the variables SG k 
extracted from temporary memory unit 90, and associates the representative 
10 value aj closest to 9 q with the q-th separated signal Y kq (Step SI 2). In other 
words, permutation/scaling resolution unit 62-k applies tags n kq to the 
separated signals Y kq representing the representative values a\ (thereby 
associating them with these representative values). 

[0056] After that, permutation/scaling resolution unit 62-k extracts the 
15 separation matrix W(f) from temporary memory unit 90 and resolves the ICA 
scaling problem by updating each row w q (f) thereof as follows: 

w q (f) <- rtwo 

and then stores the updated separation matrix W(f) in temporary memory unit 
90. For the subsequent processing in signal combination unit 80, it is 
20 desirable that the same value of j is used for the entire series k in this process. 
[0057] The separated signal values Y kq and their appended tags n kq are 
sent to time domain transformation unit 70-k. Time domain transformation 
unit 70-k uses, for example, a short time inverse discrete Fourier transform or 
the like to transform each of the separated signal values Y kq (which are 
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obtained in the time-frequency domain) into time-domain signal values, and 
stores these transformed values in temporary memory unit 90 (Step SI 3). 
Note that these time-domain signal values yk(t)=[yki nkl (t) ? -- ? ykv nkV (t)] T are 
also associated with the abovementioned tags n kq . When these associations 
5 are made, time domain transformation unit 70-k first extracts the tags n kq 
associated with the frequency-domain signal values Y kq from temporary 
memory unit 90 for each frequency. Next, time domain transformation unit 
70-k judges whether or not the tags Il kq at each frequency are all the same. If 
they are all the same, the time-domain signal values y kq are tagged by 

10 associating them with the tags n kq applied to the frequency-domain signal 
values Y kq . On the other hand, if they are not all the same then the tags of the 
time-domain signal values y kq are determined based on a majority decision. 
[0058] Next, in mask control unit 40, the variables SG and SG k are 
extracted from temporary memory unit 90, and the union set G^G k of the sets 

15 G and G k represented by these variables is taken as a new set G; this set G is 
substituted into variable SG, and this variable SG is stored in temporary 
memory unit 90 (Step SI 4). Also, mask control unit 40 reads out variables SG 
and SG 0 from temporary memory unit 90, and judges whether or not this new 
set G is equal to set G 0 (Step SI 5). Here, unless G=G 0 , the processing returns 

20 to Step S7. 

[0059] On the other hand, if G=Go, then in signal combination unit 80, 
the separated signals y kp (t) output from each system k (time domain 
transformation unit 70-k/k=l,...,u) are read out from temporary memory unit 
90, and these are selected and combined to yield all N separated signals (Step 
25 SI 6). For example, signal combination unit 80 might first compare the tags 
n kq of each separated signal y kp (t) read out from temporary memory unit 90. 
Here, when it is judged that there are no separated signal values y kp (t) that 
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have the same tag in a plurality of systems k, signal combination unit 80 
outputs all the separated signal values yk q (t) as the final separated signal 
values yj(t) (i=l,...,N) (Step SI 7). On the other hand, when it is judged that 
there are separated signal values having the same tag in a plurality of systems, 
5 signal combination unit 80 either appropriately selects one of these separated 
signal values with the same tag and outputs it as a final separated signal value 
yi(t), or calculates the mean of the separated signal values with the same tag 
and uses this mean value as the output signal (Step SI 7). 
[0060] Here, in the process whereby one of the separated signal values 

10 ykq(t) is appropriately selected and output as the final separated signal value 
yi(t), signal combination unit 80 could, for example, determine which of the 
separated signal values yk q (t) having the same tag a* contains the greatest 
power, and output it as the final separated signal value yi(t). Also, in the 
process whereby the mean of the separated signal values having the same tag 

15 is output as the final separated signal value yi(t), signal combination unit 80 
could, for example, use the following formula: 
[0061] FORMULA 14 

(where K is the number of separated signals having the same tag a^ 
20 In this way, the N signals are separated with low distortion. 

[0062] CHARACTERISTICS OF THIS EMBODIMENT 

In the conventional method described in [Conventional method 2: 

The sparsity method], the distortion of the separated signals increases when 

the separation performance is increased because when 8 in the 
25 abovementioned Formula (3) is made sufficiently small to increase the 
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separation performance, the signal components extracted by this binary mask 
become more restricted and most of the samples that should be extracted as 
components of the original signals are not extracted. In other words, many 
zero components are padded into each separated signal, making the separated 
5 signals discontinuous and generating musical noise. 

[0063] On the other hand, in this embodiment mixed signals (limited 
signals) consisting of any number between 2 and M original signals are 
extracted by a mask having a smooth profile. Consequently, it is possible to 
extract limited signals of signals (samples) over a wider range of relative 

1 0 values z(f,m) than with the binary mask of [Conventional method 2] which 
only extracts the values of one signal. 

Therefore, even when there are two or more observed signals at 
the same frequency at the same timing and the sample values are far away 
from the representative values that they should basically correspond to, there 

15 is still a high likelihood of extracting these sample values. As a result, it is 
possible to suppress the degradation of quality (generation of musical noise) 
caused by padding zero components into separated signals discontinuously. 
[0064] Also, in this embodiment, in situations where N (N>2) signals are 
mixed together and observed with M sensors, a smooth-profile mask is used 

20 to separate and extract the signals. Unlike the masks used in [Conventional 
method 2] (a binary mask with a value of 0 or 1), a mask with this smooth 
profile has a profile that extends smoothly at the edges. Consequently, if this 
smooth-profile mask is used, then even if there are two or more observed 
signals at the same frequency at a certain timing and the sample values are 

25 separated from the representative values a l5 ...,a N that the sample ought to 

correspond to, the mask for this position may have a nonzero value, and thus 
it is possible to extract more signals than with a binary mask whose value 
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changes abruptly. As a result, it is possible to suppress quality degradation 
resulting from zero components being padded discontinuously into the 
separated signals. 

[0065] Also, since the smooth-profile mask has values that become 
5 smaller with increasing proximity to the edge parts, there is less degradation 
of the separation performance than in cases where a conventional binary mask 
is simply used with a greater value of s. 

Furthermore, since the extracted limited signals are thought to 
consist only of V (<M) source signals, the separation problem becomes much 
10 simpler. Accordingly, signal separation can easily be performed on the limited 
signals by using [Conventional method 1] and/or [Conventional method 3]. 
Also, as described in the third embodiment discussed below, when V=l, it is 
not even necessary to use [Conventional method 1] or [Conventional method 

3]. 

1 5 [0066] PERFORMANCE COMPARISON 

In the following, the performance of signal separation performed 
according to [Conventional method 2] is compared in tabular form with the 
performance of signal separation performed using [Mask 2] according to the 
method of this embodiment. 

20 TABLE 1 





SIR1 


SIR2 


SIR3 


SDR1 


SDR2 


SDR3 


Conventional method 2 


17.3 


11.6 


17.6 


8.1 


7.4 


7.1 


This embodiment 




5.9 


17.6 




13.4 


17.4 


18.5 


7.0 




16.2 


13.0 





In this example, using speech signals from three speakers (two 



male and one female) as the source signals, mixtures of these signals were 
produced to simulate the results of observing them in a reverberation- free 
environment with two omni directional microphones. In the table, SIR stands 
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for "signal to interference noise ratio" (dB), which is an indicator of the 
separation performance. Also, SDR stands for "signal to distortion ratio" 
(dB), which is an indicator of the level of distortion in the signal. For both 
indicators, higher values indicate better performance. Also, SIR1 and SDR1 
5 correspond to speaker 1, SIR2 and SDR2 correspond to speaker 2, and SIR3 
and SDR3 correspond to speaker 3. The data for this embodiment is divided 
vertically into two rows corresponding to the separation results of the k=l 
system and the separation results of the k=2 system respectively. 
[0067] As this table shows, with the method of this embodiment it was 

1 0 possible to obtain SDR values substantially higher than with conventional 
method 2 with almost no degrading in separation performance SIR. This 
shows that the separation could be performed with low signal distortion. It 
can thus be seen that the method of this embodiment is an effective way of 
separating signals with low distortion in cases where the number of signal 

1 5 sources N is greater than the number of sensors M. 
SECOND EMBODIMENT 

This embodiment also relates to the first present invention. This 
embodiment is an example in which a "smooth-profile mask" is used in the 
limited signal generation unit, and a separation method based on the 

20 estimation of a mixing matrix is used in the limited signal separation unit. 
Note that in this embodiment, the description of items that are the same as in 
the first embodiment is omitted. 

[0068] FIG.8 shows a block diagram illustrating just one of the systems 
used to obtain V separated signal values in a signal separation device 
25 according to this embodiment. 

In FIG.8, configurations that are the same as in the first 
embodiment are labeled with the same reference numerals as in the first 
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embodiment. As shown by the example in FIG 8, the signal separation device 
of this embodiment differs from the signal separation device 1 of the first 
embodiment in that limited signal generation unit 50-k is replaced with 
limited signal generation unit 150-k, and limited signal separation unit 60-k is 
5 replaced with limited signal separation unit 1 60-k. It also differs in that mask 
generation unit 1 5 1-k produces two types of mask, and in that the restriction 
V=M is imposed. The configuration and processing of this embodiment are 
described below. 

[0069] First, representative value generation unit 30 (FIG. 8) extracts the 
10 frequency domain observed signal values Xj(f,m) generated by frequency 

domain transformation unit 20 (FIG1) from temporary memory unit 90. Next, 
representative value generation unit 30 (FIG. 8) calculates the relative values 
z(f,m) of the observed values in relative value calculation unit 3 1 in the same 
way as in the first embodiment, clustering is performed in clustering unit 32, 
15 and the representative values ai,a 2 ,...,a N are calculated in representative value 
calculation unit 33. Note that in this embodiment, for the relative values 
z(f,m) it is preferable to use the arrival directions of signals obtained from the 
phase differences Zi(f,m) between the observed signals at any two sensors (the 
i-th and j-th sensors) as follows: 
20 FORMULA 15 

z.(f,m)=cos — 

3V ' 27Cfd 

[0070] These representative values ai,a 2 ,...,a N are stored in temporary 
memory unit 90 (FIG.1), and are then sent via mask control unit 40 (FIG.8) to 
the mask generation unit 15 1-k of limited signal generation unit 150-k, 
25 whereby mask generation unit 1 5 1-k produces two types of mask. One is a 
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mask for extracting the limited signal values X A (f,m) containing a mixture of 
V (=M) signals corresponding to V (=M) representative values included in G k ; 
this is the smooth-profile mask M DC (f,m) shown in the first embodiment. The 
other is a binary mask M k (f,m) that extracts signals including only one signal; 
this is the same sort of mask as the one shown in [Conventional method 2] 
and is defined as follows: 
FORMULA 16 



These masks are stored in temporary memory unit 90 (FIG1). 
[0071] Next, limited signal extraction unit 152-k (FIG.8) reads out the 
smooth-profile mask M DC (f,m) and the frequency-domain observed signal 
values X(f,m) from temporary memory unit 90 (FIG1). Limited signal 
extraction unit 152-k (FIG 8) then calculates the limited signal values 
X A (f,m)=M DC (f,m)X(f,m) by multiplying the frequency-domain observed 
signal values X(f,m) by this mask M DC (f>m), and stores the results in 
temporary memory unit 90 (FIG1). Here, since these limited signal values 
X A (f,m) are approximated by a mixture of V signals, the separation of signals 
in limited signal separation unit 1 60-k can be performed by applying the 
mixing matrix estimation method discussed in [Conventional method 3]. 
[0072] Therefore, in multiplication arithmetic unit 161-k of limited signal 
separation unit 160-k (FIG 8), the binary mask M k (f,m) and frequency-domain 
observed signal values X(f,m) are first read out from temporary memory unit 
90 (FIG1). Then, multiplication arithmetic unit 161-k (FIG 8) determines the 
separated signal values X k A (f,m) including just one signal by performing the 
calculation X k A (f,m)=M k (f,m)X(f,m), and stores it in temporary memory unit 




a k - 8 < z(f,m) < a k + e 
otherwise 



(k = l,...,N) 



(18) 
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90 (FIG.1). Next, mixing process estimation unit 162-k (FIG8) reads out 
X k A (f,m) from temporary memory unit 90 (FIG1) and calculates the estimated 
mixing matrix H A in the same way as in [Conventional method 3] as follows: 
FORMULA 17 



Hj«(f)=E 



lM i (f,m i )X I (f,m l )J 



= E 



Xj(f, mi ) 



= E 



■H ji (f)S i (f,m l )" 



_X l( f jmi )J LH„(f)S i (f,m 1 )j 



= E 






|_H„(f)J 



This mixing matrix H A is an NxM matrix. It is not necessary to determine this 
mixing matrix for every series k, and instead the matrix H A estimated for one 
series may be used by storing it in temporary memory unit 90 and 
sequentially reading it out. 

10 [0073] This mixing matrix H A is sent to inverse matrix calculation unit 
163-k, and inverse matrix calculation unit 163-k first drops ranks from this 
mixing matrix. Specifically, VxV square matrix H A M is produced by only 
reading out the V columns of the mixing matrix H A that correspond to the 
limited signals X A (f,m) consisting of V signals (i.e., the columns 

15 corresponding to the V representative values aj included in G k ). This is done 
in order to separate the limited signals X A (f,m) approximated by a mixture of 
V signals. 

[0074] Next, inverse matrix calculation unit 163-k calculates the inverse 
matrix H A M _1 (f) of this square matrix H A (m), and stores it in temporary 
20 memory unit 90 (FIG.1). Multiplication arithmetic unit 164-k (FIG.8) reads 
out the limited signal values X A (f,m) and inverse matrix H A M -1 (f) from 
temporary memory unit 90 (FIG.1), and performs the calculation 
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Y k (f,m)=H A M " 1 (f)X A (f ) m) to calculate the estimated values of the V separated 
signals Y k (f,m)=[Y kl nkl (f ? m),...,Y kV nkV (f ? m)] T . Note that the appending of tag 
information to indicate which source signals the separated signals 
Y kq (q=l,...,V) correspond to is performed by using H A M instead of W" 1 in the 
5 abovementioned Formula (17) to determine the estimated arrival direction of 
the signal, and then judging which of the representative values a; this direction 
is close to. 

[0075] THIRD EMBODIMENT 

This embodiment is also an embodiment relating to the first 

10 present invention. In this embodiment, a "smooth-profile mask" is used to 
extract only a signal consisting of the signal emitted from any one signal 
source (called a "limited signal" in this embodiment) from the observed 
signals, and the extracted limited signal is taken to be the separated signal. 
Note that in this embodiment, the description of items that are the same as in 

15 the first embodiment is omitted. 

FIG9 is a block diagram showing an example of one system part 
of a signal separation device according to this embodiment, whereby one 
separated signal is obtained. Note that in FIG9, configurations that are the 
same as in the first embodiment are labeled with the same reference numerals 

20 as in the first embodiment. 

[0076] As shown by the example in FIG.9, the signal separation device of 
this embodiment differs from the signal separation device 1 of the first 
embodiment in that limited signal generation unit 50-k is replaced with 
limited signal generation unit 250-k, and in that there is no limited signal 

25 separation unit 60-k in the signal separation device of this embodiment. The 
configuration and processing of this embodiment are described below. 

First, representative value generation unit 30 (FIG.9) extracts the 
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frequency-domain observed signal values Xj(f,m) generated by frequency 
domain transformation unit 20 from temporary memory unit 90 (FIG1). 
Representative value generation unit 30 (FIG9) calculates the relative values 
z(f,m) of the observed values in relative value calculation unit 3 1 in the same 
5 way as in the first embodiment, clustering is performed in clustering unit 32, 
and the representative values ai,a2,...,a>< are calculated in representative value 
calculation unit 33. For the relative values z(f,m), it is possible to use the 
phase differences and/or amplitude ratios, or a mapping thereof (e.g., the 
arrival directions of signals as determined from the phase differences). The 
10 relative values used in this embodiment are the arrival directions of signals as 
determined from the phase differences between the observed signals as 
follows: 
FORMULA 18 

z.(f,m) = cos — 

3V } 27ifd 

[0077] These representative values a!,a 2 ,...,a N are stored in temporary 
memory unit 90 (FIG1), and these representative values ai,a 2 ,...,a N are then 
read out by mask generation unit 25 1-k of limited signal generation unit 250-k 
(FIG9), whereby mask generation unit 25 1-k produces a "smooth-profile 
mask" for extracting any one of these representative values aj. Note that the 
"smooth-profile mask" of this embodiment is a function that takes a high level 
value for relative values in a limited range including V (V=l) representative 
values, and takes a low level value for representative values that are not inside 
this limited range, and where the transitions from the high level value to the 
low level value that accompany changes of the relative value occur in a 
continuous fashion. 



15 



20 



25 
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[0078] A technique for generating a "smooth-profile mask" according to 
this embodiment is described below. 

First, mask generation unit 251-k generates an (NxN) delay 
matrix H NBF (f). Specifically, mask generation unit 251-k extracts one of the 
5 representative values ai,a 2 ,...,a N (estimated values of the arrival directions of 
extracted signals) stored in temporary memory unit 90 (FIG.1), which is 
denoted as 9i. Mask generation unit 251-k also extracts the other N-l 
representative values (the estimated values of the arrival directions of the 
signals that are not extracted) from temporary memory unit 90 (FIG.1), which 

10 are denoted as 0i (i=2,...,N). These values of 6i and 0j are stored in temporary 
memory unit 90 (FIG1). Mask generation unit 251-k sequentially extracts 9i 
and 0i from temporary memory unit 90, calculates Tji=(dj/v)cos9i (j=l,...,N) 
and the elements at (j,i) in delay matrix H NBF (f) H NBF ji(f)=exp(j27TfTjj), and 
sequentially stores the results in temporary memory unit 90. Here, dj is the 

15 distance between sensor 1 and sensor j (d]=0), f is a frequency variable, and v 
is the signal velocity. These parameters could, for example, be pre-stored in 
temporary memory unit 90 and sequentially read out when required. The 
above process results in the generation of an (NxN) delay matrix H NBF (f). 
[0079] Next, mask generation unit 25 1-k uses this delay matrix H NBF (f) to 

20 produce an NBF matrix W(f) with null beamformer (NBF) characteristics. 
This is obtained by calculating the inverse matrix W(f) of the delay matrix 
H NBF (f) using the formula W(f)=H NBF " 1 (f). This NBF matrix W(f)=H NBF " 1 (f) is 
stored in temporary memory unit 90. Then, mask generation unit 251-k 
sequentially extracts the first row elements Wi k (f) of NBF matrix W(f) and 

25 the values of d k and v from temporary memory unit 90, and generates the 

directional characteristics function F(f,0) shown in Formula (10) above. After 
that, mask generation unit 251-k uses this directional characteristics function 
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10 



F(f,0) to generate a smooth-profile mask M DC (f,m). 

[0080] For example, a mask represented by Formula (11) (referred to as 
[Mask 7]) or a mask represented by Formula (12) (referred to as [Mask 8]) in 
the first embodiment might be generated as the smooth-profile mask 
M DC (f,m) of this embodiment. 

Also, for example, it is possible to generate a [smooth-profile 
mask] having characteristics whereby the gain in the elimination signal region 
is made uniformly small as follows: 
FORMULA 19 
MASK 9 



M DC (f,m) = 



t? / & <c w \ Region outside the 

F(f,z 3 (f,m)) z,(f,m)e elimination signal 



|F(f,6 r ) 



z 3 (f , m ) e Elimination signal region 

(19) 



MASK 10 



15 



M DC (f,m) = 



F(f,z 3 (f,m)) 
F(f,6 r )| 



ff . Region outside the 
z 3 {j , m ) e e ii m i nat i on signal 

z 3 (f,m) g Elimination signal region 

(20) 



20 



[0081] Here, of the estimated values of the arrival directions of the N-l 
signals that are to be eliminated (i.e., the N-l representative values other than 
the representative value aj to be extracted), 0 r is the one closest to the 
estimated value of the arrival direction of the signal that is not eliminated (the 
extracted representative value aj). 

It is also possible to use a mask M D c(f>ni) with uniform 
directional characteristics in the extracted direction such as, for example: 



FORMULA 20 

MASK 11 



, f >. Region outside the 
z 3 u,m; e elimination gignal 

z 3 (f,m) e Elimination signal region 

z 3 (f,m) e Transitional region 

(21) 

5 It is also possible to use M DC (f,m)=|F(f,z 3 (f,m))| in the transitional region 
([Mask 12]). 

[0082] Examples of the abovementioned [Mask 8] and [Mask 12] are 
shown in FIG 10. These are examples of "smooth-profile masks" that extract 
signals arriving from direction a, and suppress signals arriving from 
1 0 directions a 2 and a 3 when the number of signals is N=3 and the number of 
sensors is M=2. 

The smooth-profile mask M DC (f,m) generated in mask generation 
unit 251-k is sent to limited signal extraction unit 252-k, and limited signal 
extraction unit 252-k extracts the separated signal Y k (f,m) according to the 

1 5 formula Y k (f,m)=M DC (f,m)Xj(f,m). 

The above process is performed in a plurality of systems until all 
the separated signals have been extracted, finally yielding all the separated 
signals Y(f,m). The signal separation device then restores these separated 
signals Y(f,m) to the time domain in the time domain transformation unit, and 

20 these signals are output by passing them straight through the signal 
combination unit. 

[0083] PERFORMANCE COMPARISON 

In the following, the performance of signal separation performed 



F(f,e,)| 

M DC (f,m)=^|F(f,e r )| 



F(f,z 3 (f,m)) 
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according to [Conventional method 2] is compared in tabular form with the 
performance of signal separation performed using [Mask 8] and [Mask 11] 
according to the method of this embodiment. 
TABLE 2 





SIR1 


SDR1 


SIR2 


SDR2 


SIR3 


SDR3 


Conventional method 2 


15.0 


7.9 


10.3 


11.1 


17.3 


9.0 


This embodiment (Mask 8) 


14.8 


12.1 


5.9 


17.2 


14.6 


11.1 


This embodiment (Mask 11) 


15.4 


13.0 


8.3 


16.1 


16.1 


11.4 



5 In this example, using speech signals from three speakers (two 

male and one female) as the source signals, mixtures of these signals were 
produced to simulate the results of observing them in a reverberation- free 
environment with two non-directional microphones. 
TABLE 3 





SIR1 


SDR1 


SIR2 


SDR2 


SIR3 


SDR3 


Conventional method 2 


15.1 


11.3 


9.0 


13.3 


13.4 


9.2 


This embodiment (Mask 8) 


14.6 


11.4 


5.5 


17.2 


14.3 


11.6 


This embodiment (Mask 11) 


15.5 


12.2 


7.9 


16.0 


15.4 


11.7 



1 0 This example shows the simulated results obtained under the 

same conditions as in Table 2 except that the signals were mixed differently 
(specifically, the positional arrangement of the speakers was changed). 
TABLE 4 





SIR1 


SDR1 


SIR2 


SDR2 


SIR3 


SDR3 


Conventional method 2 


11.0 


7.7 


4.3 


10.8 


13.4 


6.4 


This embodiment (Mask 8) 


10.8 


7.8 


2.7 


16.5 


12.9 


7.6 


This embodiment (Mask 11) 


12.0 


8.7 


3.5 


15.7 


14.9 


7.1 



This example shows the simulated results obtained under the 
1 5 same conditions as in Table 2 except for the member of speakers(three male). 
[0084] As these tables show, with the method of this embodiment it was 
possible to obtain SDR values substantially higher than with conventional 
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method 2 with almost no degrading in separation performance SIR. This 
shows that separation could be performed with low signal distortion. It can 
thus be seen that the method of this embodiment is an effective way of 
separating signals with low distortion in cases where the number of signal 
5 sources N is greater than the number of sensors M. 
[0085] FOURTH EMBODIMENT 

This embodiment also relates to the first present invention. In this 
embodiment, a smooth-profile mask is generated by convolving a binary 
mask with a smooth profile function. In the following, only the processing 

10 performed in the mask generation unit (equivalent to mask generation unit 
51-k in FIG1) is described. The other configurations and processes are the 
same as in the first through third embodiments. In this embodiment, the 
relative values z(f,m), could be derived from the parameters described in the 
first embodiment, such as the phase differences Zi(f,m), amplitude ratios 

15 z 2 (f,m), or the arrival directions z 3 (f,m) obtained from the phase differences 
z,(f,m). 

[0086] FIG. 1 1 shows a block diagram of an example of the configuration 
of mask generation unit 300-k in this embodiment. 

When the processing of mask generation unit 300-k is started, 

20 binary mask preparation unit 301-k first generates a binary mask which is a 
function that takes a high level value for relative values inside a prescribed 
range including V representative values and a low level value for relative 
values that are not inside this range, and where the transitions from the high 
level value to the low level value that accompany changes of the relative 

25 value occur in a discontinuous fashion. For example, mask generation unit 
300-k might generate a binary mask for extracting signals consisting of V 
mixed signals according to the following formula: 



FORMULA 21 



F b (z) = 



1 *a ^7 *\ 

1 ^min — ^ — "max 

0 otherwise 



[0087] When extracting a signal that includes V representative values 
from a k+ i to a k+v , the parameters a min and a max could, for example, be set in the 
5 ranges a k <a min <a k +i and a^v^max^ic+v+i- These parameters may be set 
appropriately, but more specifically a min and a max can be calculated by the 
following process, for example. 

[0088] First, mask generation unit 300-k reads in the relative values 
z(f,m), clusters Q and representative values aj (i=l,...,N) from temporary 
10 memory unit 90 (FIG.1) (see Steps S3-5 of the first embodiment), and 

calculates the variance value of each cluster Q from the following calculation: 
FORMULA 22 

a 2 (f\ =(l/|C I |)2 meTi (z(f,m)-a 1 (f)) 2 (22) 

where |Ci| is the number of relative values z(f,m) that belong to cluster Q. 
15 This variance value can also be obtained by using, for example, an EM 

algorithm (see, e.g., Morio Onoe (trans.): "Pattern Classification," Shingijutsu 

Communications, ISBN 4-915851-24-9, chapter 10.) or the like, and fitting 

the data to a Gaussian model. 

The calculated variance values a 2 j are stored in temporary 
20 memory unit 90 (FIG. 1), and then mask generation unit 301-k (FIG. 11) reads 

in the variance values a 2 \ and representative values aj (in this example, the 

mean values of the clusters Q) stored in temporary memory unit 90, and uses 

them to perform the following calculation: 
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FORMULA 23 



= g k + i ' a k +a k ' a k + i 

min 

CT k + l +CT k 



^k+V * a k+V-t-l + ^k+V+l ' a k-fV 



a max 

a k+V a k+V+l 



(This ends the description of the specific calculation example of a m i n and 

a max-) 

5 [0089] The binary mask F b (z) generated in this way is stored in 
temporary memory unit 90(Fig. 1). 

Next, a single-peak function generator unit 302-k (FIG.1 1) 
generates a single-peak function g(z) whose value changes continuously with 
changes in z, and stores it in temporary memory unit 90 (FIG1). As an 
10 example of single-peak function g(z), it is possible to use a function with a 
smooth profile such as the following Gaussian function: 
FORMULA 24 

8(z) ^ exp BS 2 ; 

Here, a represents the standard deviation of g(z). For example, when 
15 extracting the range a k +i through a k+v? it is preferable to set a to a suitable 

value so that a min -a > a k +a k and a max +a < a k + v +i-<Sk+v+i> e.g., by choosing a 

so that a=min(a k ,a k +v+i). Here, a k and a k+v +i are given by Formula (22). 

Also, the notation min(a,(3) represents the operation of extracting the smaller 

of the values a and p. 
20 [0090] Next, a convolutional mixing unit 303-k (FIG. 11) reads in binary 

mask F b (z) and single-peak function g(z) from temporary memory unit 90 
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(FIG.1), calculates the convolution of this binary mask F b (z) with the single- 
peak function g(z) with the function F(z)=F b (z)*g(z), and stores it in 
temporary memory unit 90 (FIG1). Here, the symbol * is a convolution 
operator for the variable z. 

After that, mask configuration unit 304-k (FIG.1 1) reads in the 
relative values z(f,m) and function F(z) from temporary memory unit 90 
(FIG1), generates a mask by substituting the relative values z(f,m) into 
function F(z) as follows: 

M DC (f,m) = F(z(f,m)) (24) 

and stores this mask in temporary memory unit 90 (FIG.1). 

[0091] Alternatively, the mask of Formula (24) could be obtained using a 

smooth-profile function F(z) that produces a mask profile where the ends of a 

binary mask F b (z) have straight-line (or curved) segments with a given 

gradient. 

[0092] The mask of Formula (24) could also be obtained by using a mask 
configuration unit 304-k (FIG. 11) to read in the representative values a { (in 
this example, the mean values of clusters Cj) and the values of a min , a max and 
variance values a 2 { obtained as shown in Formula (22) and Formula (23), 
calculate a Gaussian function with a mean of aj(f) and a variance of a j(f): 
FORMULA 25 

normalize this Gaussian function by replacing g;(z) with gi(z)/gj(aj) so that the 
value at a t becomes 1, and then perform the following calculation: 



10 



15 



20 



gi(z) = 



exp< 
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gk( Z ) a min > Z 

F(z) = 



gk( Z ) a min > Z 

1 a min ^ Z 

Sk+V a max ^ Z 



[0093] FIFTH EMBODIMENT 

This embodiment also relates to the first present invention. In this 
embodiment, a smooth-profile mask is generated from the differences 
5 between odd functions. In the following, only the processing performed in the 
mask generation unit (equivalent to mask generation unit 5 1-k in FIG1) is 
described. Note that the other configurations and processes are the same as in 
the first through third embodiments. 

The mask generation unit of this embodiment generates a smooth- 
1 0 profile mask from a single-peak function obtained by mapping the differences 
between a first odd function that is zero when the relative value is the lower 
limit value a m i n of a limited range and a second odd function that is zero when 
the relative value is the upper limit value a max of a limited range. For example, 
a "smooth-profile mask" could be made using the following function: 

1 5 M DC (f,m) = {tanh(z(f,m)-a min ) - tanh(z(f,m)-a max )} a 

Note that for the relative values z(f,m), this embodiment uses the phase 
differences Zi(f,m) and/or amplitude ratios z 2 (f,m) shown in the first 
embodiment and the like, or a mapping thereof (e.g., the arrival directions 
z 3 (f,m) of signals as determined from the phase differences, or the like). Also, 
20 a is any positive number, and the values of a min and a max are obtained in the 
same way as in the fourth embodiment. If necessary normalization may also 
be performed using a formula such as the following: 

M DC (f 5 m)=M DC (f,m)/max(M D c(f,m)) 
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[0094] SIXTH EMBODIMENT 

This embodiment also relates to the first present invention. A 
mask according to this embodiment is generated in mask generation unit 51-k 
shown in FIG.1 and FIG.2, and is a function (binary mask) that takes a high 
level value for relative values inside a prescribed range including V 
representative values and a low level value for representative values that are 
not inside this prescribed range, and where the transitions from the high level 
value to the low level value occur in a discontinuous fashion. Here, 2<V<M. 
As a specific example, a mask could be generated according to the following 
formula: 
FORMULA 26 



Note that when extracting a signal including V representative values from a k + j 
to a k+v > the parameters a m j n and a max could, for example, be set in the ranges 
a k <a min <a k+1 and a k+v <a max <a k+v +i. More specifically, a min and 

a max COUld, for 

example, be generated according to the same procedure as the method 
discussed in the fourth embodiment. In this embodiment, the relative values 
z(f,m) could be derived from parameters such as the phase differences z^m), 
amplitude ratios z 2 (f,m), or the arrival directions z 3 (f,m) obtained from the 
phase differences Zi(f,m). 

[0095] Furthermore, the number of relative values z(f,m) included in the 
range from a min to a max should be at least as much as the number of sensors 2 
and no greater than M, and should preferably be equal to the number of 
sensors M. As in the first embodiment, a plurality of binary masks B(f,m) are 
generated in this embodiment. 




otherwise 




max 



(25) 
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[0096] As a specific example, mask control unit 40 (FIG1, FIG.2) reads 
out the representative values ai,a2,.-->a>i from temporary memory unit 90, 
substitutes the data specifying a set Go whose elements are these 
representative values ai,a 2 ,...,a N into a variable SG 0 , and stores this variable 
5 SG 0 in temporary memory unit 90. Mask control unit 40 also initializes a 

variable SG specifying a set G to G=0 (the empty set), and a variable k is set 
to zero; these are stored in temporary memory unit 90 (FIG.4; Step S6). Next, 
under the control of mask control unit 40, processing is performed in a 
plurality of systems (u systems) in limited signal generation unit 50-k 

10 (k=l,...,u), limited signal separation unit 60-k and time domain transformation 
unit 70-k until all N separated signals have been obtained. First, mask control 
unit 40 adds 1 to the value of variable k stored in temporary memory unit 90 
to obtain a new value for variable k which is stored back in temporary 
memory unit 90 (FIG.4; Step S7). Next, mask control unit 40 retrieves the 

15 variables SGo and SG from temporary memory unit 90. Then, in mask control 
unit 40, a set G k is selected consisting of V (<M) suitable representative 
values including the members of the complementary set G c of the set G 
specified by SG (the notation a c represents the complementary set of a), the 
data specifying this set G k is substituted into variable SG k , and this variable 

20 SG k is stored in temporary memory unit 90 (FIG.4; Step S8). The mask 
generation unit 51-k of limited signal generation unit 50-k reads out the 
variable SG k stored in temporary memory unit 90, and produces a binary 
mask that extracts signals in the clusters whose representative values are in 
the set G k specified by this variable SG k (FIG4; Step S9). 

25 [0097] FIG. 1 2A shows an example of a binary mask in this embodiment. 
This is an example of a binary mask that takes a high level value (e.g., 1) for 
relative values z 3 (f,m) inside a prescribed range that includes two 



56 

representative values aj and a 2 , and takes a low level value (e.g., 0) for 
representative value a 3 which is not inside this prescribed range. The vertical 
axis of this figure represents the gain of the binary mask, and the horizontal 
axis represents the relative value z 3 (f,m) (the DOA (direction of arrival ) of 
5 the signal in degrees). As this figure shows, the high level values of this 

binary mask are flat, and the transitions between the high level value and the 
low level value are discontinuous. 

[0098] Note that the other configurations and processes are the same as in 
the first and second embodiments. Specifically, this embodiment extracts the 

10 values of a mixed signal (called a "limited signal" in this embodiment) 
comprising the signals emitted by V signal sources from the frequency 
domain values by using a binary mask B(f,m) instead of the smooth-profile 
mask M DC (f,m) used in the first and second embodiments, and then performs 
the processing of the first or second embodiment. 

15 Also, the processing whereby binary mask B(f,m) is used to 

extract the values of a mixed signal comprising the signals emitted by V 
signal sources from the frequency domain signal values is performed by 
multiplying the frequency domain observed signal values Xj(f,m) by the 
binary mask B(f,m) (X A (f,m)=B(f,m)X(f,m)). 

20 [0099] CHARACTERISTICS OF THIS EMBODIMENT 

In the conventional method described in "Conventional method 2: 
The sparsity method", the distortion of the separated signals increases when 
the separation performance is increased because when e in the 
abovementioned Formula (3) is made sufficiently small to increase the 

25 separation performance, the signal components extracted by this binary mask 
become more restricted and most of the samples that should be extracted as 
components of the original source signal are not extracted. In other words, 
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many zero components are padded into each separated signal, making the 
separated signals discontinuous and generating musical noise. 
[0100] On the other hand, in this embodiment mixed signals (limited 
signals) consisting of any number between 2 and M source signals are 
5 extracted by a binary mask B(f,m). Consequently, it is possible to extract 
limited signals of signals (samples) over a wider range of relative values 
z(f,m) than with the binary mask of [Conventional method 2] which only 
extracts the values of one signal. For example, in the example shown in 
FIG 12 A, it is possible to extract not only sample values whose relative value 

10 z(f,m) lies in the vicinity of representative values aj and a 2 , but also sample 
values whose relative value z(f,m) is positioned between ai and a 2 . Also, a 
sample that is positioned between ai and a 2 , for example, is highly likely to be 
a sample that corresponds to the representative value of ai or a 2 . 
[0101] Therefore, even when there are two or more observed signals at 

1 5 the same frequency at the same timing and the sample values are separated 
from the representative values that they should basically correspond to, there 
is still a high likelihood of extracting these sample values. As a result, it is 
possible to suppress the degradation of quality (generation of musical noise) 
that occurs due to padding zero components into discontinuous separated 

20 signals. 

[0102] VERIFICATION OF THE EFFECTS OF ZERO-PADDING 
WITH A BINARY MASK 

In the following the effects of zero-padding with a binary mask 
are discussed with regard to the case where the speech signals Si, s 2 and s 3 of 
25 three speakers are observed with two omni directional microphones (i.e., 
N=3, M=2). 

If the proportion of the signal power lost as a result of zero- 
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padding by using the binary mask is defined as follows: 
FORMULA 27 



Z,K(')l a -Z.|y.(»)l' xl0O (26) 

then the proportion of the signal power lost by the binary mask in the 
5 conventional method of "Conventional method 2: the sparsity method" was as 
follows: si: 17%, s 2 : 14%, s 3 : 23%. 

[0103] On the other hand, the signal power degradation of the binary 
mask B(f,m) according to this embodiment was S\: 2.5%, s 2 : 5.7% for the case 
where the two signals S\ and s 2 were mixed together, and s 2 : 8.1%, s 3 : 0.7% 

1 0 for the case where the two signals s 2 and s 3 were mixed together. 

Thus, in this embodiment the degradation of the signals by the 
binary mask B(f,m) is less than in the conventional method. This indicates 
that musical noise is less likely to be generated in this embodiment. 
[0104] PERFORMANCE COMPARISON 

1 5 The simulation results obtained with this embodiment are shown 

below. 
TABLE 5 





SIR1 


SIR2 


SIR3 


SDR1 


SDR2 


SDR3 


Conventional method 2 


15.4 


10.3 


14.6 


9.8 


11.9 


9.2 


This embodiment 




8.4 


16.4 




15.0 


20.9 


13.1 


8.2 




17.4 


13.8 





In this example, limited signals are extracted with a binary mask 



according to this embodiment, and signal separation is performed on these 
20 limited signals by supplying them to an ICA process. Also, in this example, 
using speech signals from three speakers (two male and one female) as the 
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source signals, mixtures of these signals were produced to simulate the results 
of observing them in an anechoic environment with two omni-directional 
microphones. As this tables shows, with the method of this embodiment it 
was possible to obtain SDR values substantially higher than with 
5 conventional method 2 with almost no degrading in separation performance 
SIR. This shows that the separation could be performed with much lower 
distortion in the method of this embodiment. 
[0105] SEVENTH EMBODIMENT 

This embodiment also relates to the first present invention, and is 

10 a variant of the abovementioned sixth embodiment. That is, this embodiment 
also uses a binary mask to extract a limited signal in cases where 2<V<M, but 
it differs in terms of the method used to produce the binary mask B(f,m) and 
the limited signal calculation processing. In the following, only the method 
used to produce the binary mask B(f,m) and the limited signal calculation 

15 processing are described; the other aspects of the processing and functional 
configuration are the same as in the first embodiment or second embodiment, 
so their description is omitted. 

[0106] The purpose of a binary mask B(f,m) according to this 
embodiment is to extract observed signal components other than the 

20 abovementioned limited signal. That is, a binary mask B(f,m) produced by a 
mask generation unit according to this embodiment is a function that takes a 
low level value for relative values inside a prescribed range including V 
representative values (this set is referred to as G k ) and a high level value for 
representative values that are not inside this prescribed range (G k c ), and where 

25 the transitions from the high level value to the low level value occur in a 
discontinuous fashion. Here, 2<V<M. 

[0107] Specifically, a mask generation unit 5 1-k according to this 
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embodiment generates a binary mask as shown by Formula (3) above for the 
representative values included in set G k c , for example. In this embodiment, the 
relative values z(f,m) could be derived from parameters such as the phase 
differences z^m), amplitude ratios z 2 (f,m), or the arrival directions z 3 (f,m) 
5 obtained from the phase differences zi(f,m). FIG12B shows an example of a 
binary mask B(f,m) according to this embodiment. This is an example of a 
binary mask that takes a low level value (e.g., 0) for relative values z 3 (f,m) 
inside a prescribed range that includes V=2 representative values aj and a 2 , 
and takes a high level value (e.g., 1) for representative value a 3 which is not 

10 inside this prescribed range. The vertical axis of this figure represents the gain 
of the binary mask, and the horizontal axis represents the relative value 
z 3 (f,m) (the arrival direction of the signal in degrees). As this figure shows, 
the high level values of this binary mask are flat, and the transitions between 
the high level value and the low level value are discontinuous. 

15 [0108] The limited signal extraction unit of this embodiment extracts 
limited signal values X A (f,m) by subtracting the product of this binary mask 
B(f,m) and the frequency domain signal values Xj(f,m) from the frequency 
domain signal values Xj(f,m). For example, by producing binary masks 
Mj(f,m) as shown in Formula (3) above for N-M representative values 

20 included in set G k c , the values X A (f,m) of a limited signal consisting only of 
M source signals can be calculated by performing the following calculation: 
FORMULA 28 

X(f, m) = X(f, m) - J] {M i (f, m)X(f, m)} (27) 



25 



Although the abovementioned binary masks Mi(f,m) of Formula (3) are 
binary masks that take a high level value for only one representative value 
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each, the processing of this embodiment can also be implemented using 
binary masks that take a high level value for 2 or more representative values. 
Also, instead of a binary mask, the processing of this embodiment may be 
implemented using a smooth-profile mask as described above. 
5 When a limited signal X A (f,m) has been calculated, the 

subsequent limited signal separation, time domain transformation and signal 
combination processes can be performed in the same way as in the first 
embodiment or second embodiment. 
[0109] EIGHTH EMBODIMENT 

1 0 This embodiment relates to the second present invention, in which 

masks are defined by clustering the observed values in M-dimensional 
domains when the signals are observed with M sensors. The following 
description focuses on the differences between this embodiment and the first 
embodiment, and the description of items that are the same as in the first 

1 5 embodiment is omitted. 

FIG 13 shows a block diagram of an example of the configuration 
of representative value generation unit 430, mask control unit 40 and limited 
signal generation unit 450-k in this embodiment. This figure only shows one 
system that obtains V separated signals. In this embodiment, 1<V<M. 

20 [0110] The signal separation device of this embodiment differs 

structurally from signal separation device 1 of the first embodiment in the 
representative value generation unit and limited signal generation unit. 
Specifically, this embodiment is provided with a representative value 
generation unit 430 (FIG 13) instead of the representative value generation 

25 unit 30 of signal separation device 1 of the first embodiment (FIG1), and is 
provided with a limited signal generation unit 450-k (FIG 13) instead of the 
limited signal generation unit 50-k of signal separation device 1 (FIG.1). The 
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other aspects of the configuration are the same as in the first embodiment. 

FIG 14 shows a flowchart that illustrates the signal separation 
process in this embodiment. In the following, the signal separation process of 
this embodiment is described with reference to this flowchart. 
5 [0111] First, as a preliminary process, the time-domain observed signals 
xj(t) (j=l,...,M) observed by each sensor are stored in memory unit 2 (FIG.1). 
Then, when the signal separation process is started, signal separation 
processor 3 performs the following processing under the control of control 
unit 10. 

10 First, signal separation processor 3 accesses memory unit 2 under 

the control of control unit 10 and sequentially reads out each of the observed 
signal values Xj(t), which it sends to frequency domain transformation unit 20 
(Step S21). Frequency domain transformation unit 20 uses a transformation 
such as a short time discrete Fourier transform to transform these signal 

15 values into a series of frequency-domain observed signal values Xj(f,m) for 
each time interval, which it stores in temporary memory unit 90 (Step S22). 
[0112] Next, clustering unit 432 (FIG. 13) reads out the frequency-domain 
observed signal values Xi(f,m),...,X M (f,m) stored in temporary memory unit 
90 (FIG1). Clustering unit 432 (FIG. 13) then clusters the observed signal 

20 vectors X(f,m)=[X 1 (f,m),...,X M (f,m)] (called the "first vectors") consisting of 
the frequency-domain signal values Xj(f,m),...,X M (f,m) into N clusters each 
Q(f) (i=l,...,N), and generates N clusters Q (i=l,2,...,N) equal to the number 
of signal sources N (Step S23). These N clusters Q are stored in temporary 
memory unit 90 (FIG. 1). 

25 [0113] In this embodiment a cluster is a set of observed signal vectors 

X(f,m), and is expressed using the set Ti of discrete time intervals in the form 
Ci(f)={X(f,m)|meTi}. Also, the aim of clustering is to classify samples 
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(observed signal vectors X(f,m)) in which the same signal source is 
predominant (i.e., a main component) into the same cluster. Note that the 
resulting N clusters Ci(f),...,C N (f) do not necessarily need to be disjoint 
(Cj(f)nCj(f)=0; i^j), and there may be some elements that do not belong to 
any cluster: 
FORMULA 29 



[0114] DETAILED DESCRIPTION OF THE PROCESSING IN 
CLUSTERING UNIT 432 

The processing performed by clustering unit 432 is described in 
greater detail here. 

The clustering unit 432 of this example performs clustering after 
normalizing each sample so that clustering can be suitably achieved — i.e., so 
that samples (observed signal vectors X(f,m)) in which the same signal source 
is dominant are classified into the same clusters. 

For example a normalization unit 432a (FIG. 13) could read in the 
observed signal vectors X(f,m) from temporary memory unit 90 (FIG.1), 
perform the following calculation: 
FORMULA 30 



X(f,m)e|X C 




(| X j (f,m)|*0) 
(| X J (f,m)|=0) 



(28) 



and normalize it as follows: 




(| X j (f,m)|*0) 
(| X j (f,m)|=0) 



(29) 
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and then a cluster generation unit 432b could cluster the results of this 
normalization. 

[0115] If it is also necessary, the normalization unit 432a of this example 
follows the normalization of Formula (28) and Formula (29) with additional 
normalization as follows: 
FORMULA 31 



X(f,m)/|| X(f,m)|| (|| X(f,m)||* 0) 
X(f,m) (||X(f,m)||=0) 



X(f,m) 



and cluster generation unit 432b performs clustering on the results of this 
normalization. Here, the vector length || X(f,m) || is the norm of X(f,m), 
1 0 which can, for example, be obtained from the L 2 norm || X(f, m) || =L 2 (X(f,m)) 
defined as follows: 
FORMULA 32 

L k (X(f, m)) = ^ | Xj f (f, m) J (3 1 ) 

[0116] Also, as the clustering method performed by cluster generation 
15 unit 432b, it is possible to use a method described in many textbooks, such as 
hierarchical clustering or k-means clustering (see, e.g., Morio Onoe (trans.): 
"Pattern Classification," Shingijutsu Communications, ISBN 4-915851-24-9, 
chapter 10). Note that in any clustering method, the distance between two 
samples X(f,m) and X'(f,m) is defined as a means of measuring the proximity 
20 between samples, and clustering is performed so that every effort is made to 
include samples that are close to each other in the same clusters. 
[0117] For example, when the samples are normalized only by Formula 
(29) above, cluster generation unit 432b performs clustering using the cosine 
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distance between two normalized observed signal vectors X(f,m) as a measure 
of the distance between them. The cosine distance between two samples 
X(f,m) and X'(f,m) is defined as follows: 

1 - X H (f, m) • X'(f, m) / (|| X(f, m) || - 1| X'(f, m) ||) (32) 

5 [0118] Also, when the samples have been normalized according to the 
abovementioned Formula (29) and Formula (30), cluster generation unit 432b 
performs clustering using the L 2 norm || X(f, m) - X'(f, m) || = 
L 2 (X(f,m)-X'(f>m)) of the difference between two normalized observed signal 
vectors X(f,m)-X'(f,m), or an L k norm with any value of k, or the cosine 

10 distance (Formula (32)) as a measure of the distance between them. (This 
ends the [Detailed description of the processing in clustering unit 432].) 

Next, representative value calculation unit 433 sequentially 
extracts each cluster Q(f) stored in temporary memory unit 90 (FIG.1), and 
calculates a representative vectors (we call it to the "second vectors") aj(f) to 

15 represent each cluster d(f) (Step S24). 

[0119] DETAILED DESCRIPTION OF THE PROCESSING IN 
REPRESENTATIVE VALUE CALCULATION UNIT 433 

For example, representative vector generation unit 433a of 
representative value calculation unit 433 (FIG 13) first sequentially extracts 

20 each class Q(f) stored in temporary memory unit 90 (FIG.1), and calculates 
the mean of the sample values X(f,m) belonging to each cluster Q(f) 
FORMULA 33 

a 1 ( f ) = Z X(f , m) « c , (f) X(f,m)/|C i (f)l 
as a representative vector ai(f) relating to each signal source. Alternatively, the 
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samples X(f,m) belonging to each cluster Q(f) may be suitably quantized and 
the most frequent values chosen as the representative vectors aj(f). The 
representative vectors ai(f) obtained in this way are stored in temporary 
memory unit 90 (FIG1). 
5 [0120] Next, a sorting unit 433b (FIG 1 3) reads out these representative 
vectors ai(f),...,a N (f) from temporary memory unit 90 (FIG1) and reassigns 
the subscript i of each representative vector a^f) so that the correspondence 
between these representative vectors ai(f),...,a N (f) and each source signal s k (t) 
becomes the same at all frequencies f (Step S25). 
10 For example, sorting unit 433b (FIG 13) might perform the 

following calculation using the representative vector aj(f) for each frequency 
f: 

FORMULA 34 

arg^ffl/a^f)) 
eXV^os —^—-^ (33) 

15 to calculate the estimated value 0i(f) of the arrival direction of source signal i 
at each frequency f. Here, dj is the position of sensor j, v is the velocity of the 
signal, and aji(f) is the i-th element of representative vector aj(f) — for the 
values of dj and v, it is assumed that data pre-stored in temporary memory 
unit 90 is used, for example. 

20 [0121] The estimated values 0i(f) calculated in this way are, for example, 
stored in temporary memory unit 90 (FIG1) after associating them with the 
representative vector a^f) used to calculate them. Next, sorting unit 433b 
(FIG 13), for example, reads in each estimated value Q { (f) from temporary 
memory unit 90, and sorts them into a prescribed ordering (e.g., ascending or 

25 descending order) for each frequency f. This sorting could, for example, be 
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performed using a known sorting algorithm. The information representing the 
order of the sorted representative vectors at each frequency f 
(]'(f,aj(f))=l,2,...,N) is then stored in temporary memory unit 90 (FIG1). Next, 
sorting unit 433b (FIG 13) reads in this ordering information j'(f,ai(f)) from 
5 temporary memory unit 90, for example, and changes the correspondence 
between the representative vectors and the indices i (i.e., it reassigns the 
subscripts i of ai(f)) so that each values of aj(f) is made to correspond to the 
j'(f,ai(f))-th source signal. These representative vectors aj(f) with reassigned 
subscripts i are then stored in temporary memory unit 90 (FIG1). 

10 [0122] Next, mask control unit 40 substitutes the data specifying a set G 0 
whose elements are these representative vectors aj(f) into a variable SG 0 , and 
stores this variable SGo in temporary memory unit 90. Mask control unit 40 
also initializes a variable SG specifying a set G to G=0 (empty set) and sets a 
variable k to zero, and stores them in temporary memory unit 90 (Step S26). 

15 Next, under the control of mask control unit 40, processing is 

performed by a plurality of systems (u systems) comprising limited signal 
generation unit 50-k (k=l,...,u), limited signal separation unit 60-k and time 
domain transformation unit 70-k until all N separated signals have been 
obtained. 

20 [0123] First, mask control unit 40 adds 1 to the value of variable k stored 
in temporary memory unit 90 to obtain a new value for variable k which is 
stored back in temporary memory unit 90 (Step S27). 

Next, mask control unit 40 retrieves the variables SG 0 and SG 
from temporary memory unit 90 (FIG.1). Then, in mask control unit 40, a set 

25 G k consisting of V (<M) suitable representative vectors a p (f) (p=l,...,V) 

(called the "third vectors") including the members of the complementary set 
G c of the set G specified by SG (the notation a c represents the complementary 



set of a), the data specifying this set G k is assigned to variable SG k is selected 
from the set G 0 specified by variable SG 0 , and this variable SG k is stored in 
temporary memory unit 90 (Step S28). That is, mask control unit 40 extracts 
V representative vectors a p (f) (p=l,...,V) corresponding to the V signals 
5 extracted as limited signals from each representative vector ai(f),...,a N (f). 
[0124] In this embodiment, sample values X(f,m) that are close to the 
representative vectors a p (f) included in this set G k are extracted, and sample 
values X(f,m) that are close to representative vectors not included in this set 
G k (i.e., the elements of set G k c , where * c is the complementary set of *) are 
10 not extracted, thereby producing a limited signal X A (f,m) consisting of a 
mixture of V signals. 



45 1-k of limited signal generation unit 450-k (FIG. 13) reads in the variables 
SG k and SG 0 and the observed signal vector X(f,m) from temporary memory 
15 unit 90 (FIG.1) and generates the following mask M k (f,m) (Step S29): 
[0125] FORMULA 35 



For this purpose, in this embodiment the mask generation unit 




0 otherwise 




D(X(f,m),a p (f)) < min vf)gG£ D(X(f,m),a q (f)) 



Here, D(X(f,m),aj(f)) represents the Mahanalobis square distance 
20 between vector X(f,m) and aj(f): 



D(X(f, m), a, (f )) = (X(f, m) - a, (f )) H Z" 1 (X(f, m) - a, (f)) 



Z represents the covariance matrix of cluster C\ : 
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Z = I(X(f , m) - a, (f))(X(f, m) - a, (f )) H 

and |Q| represents the number of samples that belong to cluster Q. 
Alternatively, when it is known that the magnitudes of the source signals are 
more or less the same, a co variance matrix of Z=I (identity matrix) may be 
5 used. 

[0126] This mask M k (f,m) is stored in temporary memory unit 90 (FIG.1), 
and then limited signal extraction unit 452-k (FIG. 13) reads in mask M k (f,m) 
and observed signal vector X(f,m) from temporary memory unit 90, calculates 
the product of mask M k (f,m) and observed signal vector X(f,m): 

1 0 X k A (f,m) = M k (f,m>X(f,m) 

and extracts the limited signal values X k A (f,m) emitted from V signal sources 
(Step S30). 

[0127] These limited signal values X k A (f,m) are stored in temporary 
memory unit 90 (FIG. 1 ) and are then sent to limited signal separation unit 

1 5 60-k, which uses these limited signal values X k A (f,m) to perform signal 

separation on the limited signals (Step S31). Here, the limited signal values 
X k A (f,m) are treated as approximations to the values of mixed signals 
consisting of the signals emitted from V (1<V<M) signal sources. A 
separation matrix for these signals can therefore be estimated using the 

20 independent component analysis method described in [Conventional method 
1]. Specifically, separation can be performed using Formula (2) mentioned in 
[Conventional method 1], for example, using limited signal values X k A (f,m) 
as the independent component analysis inputs instead of the observed signal 
values X. Note that when V=l, the processing of Step 3 1 is not necessary. 
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[0128] To perform the ICA separation in this embodiment, the limited 
signal values X k A (f,m) are first used to generate a separation matrix W(f,m) in 
ICA separation matrix estimation unit 61-k according to the abovementioned 
learning rule of Formula (2), and this separation matrix W(f,m) is stored in 
5 temporary memory unit 90. This separation matrix W(f,m) is for example 
generated using feedback from the output values Y k (f,m) from 
permutation/scaling resolution unit 62-k, which is described below. The 
resulting separation matrix W(f,m) is sent to permutation/scaling resolution 
unit 62-k. Permutation/scaling resolution unit 62-k uses this separation matrix 

10 W(f,m) and the limited signal values X k (f,m) to generate the respective 

separated signal values Y k (f,m)=[Y k i nkl (f 3 m),... ? Y kV nkv (f,m)] T by performing 
the calculation Y k (f,m)=W(f,m)X k A (f,m), and stores them in temporary 
memory unit 90. Permutation/scaling resolution unit 62-k then, for example, 
feeds back these separated signal values Y k (f,m) to resolve the permutation 

15 problem with the method mentioned in [Conventional method 1]. After 

resolving the permutation problem, permutation/scaling resolution unit 62-k 
then applies tags IT kq to the separated signal values Y kq (q=l,...,V) to show 
which source signal the separated signal values Y kq (q=l,...,V) correspond to, 
and these are stored together in temporary memory unit 90. Here, these tags 

20 n kq are represented by adding the superscript n kq to the separated signal 
values Y kq . 

[0129] For example permutation/scaling resolution unit 62-k could 
compare the estimated arrival direction 9 q (f) of the signals, which are 
obtained using the inverse matrix of separation matrix W(f) extracted from 
25 temporary memory unit 90 (or the Moore-Penrose pseudo-inverse matrix 
when N^M) by the following formula: 
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FORMULA 36 

arg([W^(f)]/[W^(f)]) 
9 q (f) = arccos ^^T^j 

(where v is the signal velocity and dj is the position of sensor j) 
with the representative vector a p (f) included in set Gk indicating the variables 
SGk extracted from temporary memory unit 90, and could associate the 
representative vector a p (f) closest to 0 q with the q-th separated signal Y kq 
(Step S32). In other words, permutation/scaling resolution unit 62-k applies 
tags n kq representing the representative values a { to the separated signals Y kq 
(in other words associating the tags n kq with the separated signals Y kq ). 
[0130] After that, permutation/scaling resolution unit 62-k extracts the 
separation matrix W(f) from temporary memory unit 90 and resolves the ICA 
scaling problem by updating each row w q (f) thereof as follows: 

W q (f) «- [W^f^W^f) 

and then stores the updated separation matrix W(f) in temporary memory unit 
15 90. For the subsequent processing in signal combination unit 80, it is 

desirable that the same value of j is used for the entire series k in this process. 

The separated signal values Y kq appended tags Il kq are sent to 
time domain transformation unit 70-k. Time domain transformation unit 70-k 
uses, for example, a short time inverse discrete Fourier transform or the like 
20 to transform each of the separated signal values Y kq (which are obtained in the 
time-frequency domain) into time-domain signal values, and stores these 
transformed values in temporary memory unit 90 (Step S3 3). Note that these 
time-domain signal values yk(t)=[yki nkl ( t )v ? ykv nkV (t)] T are also associated 



5 



10 
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with the abovementioned tags n kq . When these associations are made, time 
domain transformation unit 70-k first extracts the tags n kq associated with the 
frequency-domain signal values Y kq from temporary memory unit 90 for each 
frequency and time interval. Next, time domain transformation unit 70-k 
5 judges whether or not the tags n kq at each frequency and time interval are all 
the same. If they are all the same, the time-domain signal values y kq are 
tagged by associating them with the tags n kq applied to the frequency-domain 
signal values Y kq . On the other hand, if they are not all the same then the tags 
of the time-domain signal values y kq are determined based on a majority 
10 decision. 

[0131] Next, in mask control unit 40, the variables SG and SG k are 
extracted from temporary memory unit 90, and the union set G^G k of the sets 
G and G k represented by these variables is taken as a new set G; this set G is 
substituted into variable SG, and this variable SG is stored in temporary 
15 memory unit 90 (Step S34). Also, mask control unit 40 reads out variables SG 
and SG 0 from temporary memory unit 90, and judges whether or not this new 
set G is equal to set G 0 (Step S3 5). Here, unless G=G 0 , the processing returns 
to Step S27. 

[0132] On the other hand, if G=G 0 , then in signal combination unit 80, 
20 the separated signals y kp (t) output from each system k (time domain 

transformation unit 70-k; k=l,...,u) are read out from temporary memory unit 
90, and these are selected and combined to yield all N separated signals (Step 
S3 6). For example, signal combination unit 80 might first compare the tags 
n kq of each separated signal y kp (t) read out from temporary memory unit 90. 
25 Here, when it is judged that there are no separated signal values y kp (t) that 
have the same tag in a plurality of systems k, signal combination unit 80 
outputs all the separated signal values y kq (t) as the final separated signal 
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values yi(t) (i=l,...,N) (Step S37). On the other hand, when it is judged that 
there are separated signal values having the same tag in a plurality of systems, 
signal combination unit 80 either appropriately selects one of these separated 
signal values with the same tag and outputs it as a final separated signal value 
5 yi(t), or calculates the mean of the separated signal values with the same tag 
and uses this mean value as the output signal (Step S3 7). 
[0133] Here, in the process whereby one of the separated signal values 
ykq(t) is appropriately selected and output as the final separated signal value 
yi(t), signal combination unit 80 could, for example, determine which of the 

10 separated signal values yk q (t) having the same tag aj contains the greatest 
power, and output it as the final separated signal value yi(t). Also, in the 
process whereby the mean of the separated signal values having the same tag 
is output as the final separated signal value yi(t), signal combination unit 80 
could, for example, use the following formula: 

15 FORMULA 37 



(where K is the number of separated signals having the same tag aj) 
In this way, the N signals are separated with low distortion. 
[0134] As a variant of this embodiment, the limited signal values could 
20 be generated directly from the formula: 



FORMULA 38 

a fX(f,m) max (f)€Gk D(X(f, m), a p (f)) < mm D(X(f,m),a q (f)) 




otherwise 



without generating a mask M(f,m). For example, limited signal generation 
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unit 450-k could judge whether or not the observed signal vectors X(f,m) 
satisfy the following condition: 
FORMULA 39 

max ap(f)eGk D(X(f ? m) 5 a p (f))<min^ (f)eG e D(X(f,m),a q (f)) 

5 and it could extract the observed signal vectors X(f,m) judged to satisfy this 
condition as the signal values emitted from the signal sources. 
[0135] NINTH EMBODIMENT 

This embodiment relates to the third present invention. 
CONFIGURATION 

10 FIG 15 is a block diagram showing an example of the 

configuration of a brand signal separation device 500 according to this 
embodiment. The arrows in this figure indicate the flow of data, but the flow 
of data into and out from control unit 521 and temporary memory unit 522 is 
not shown. Specifically, even when data passes through control unit 521 or 

15 temporary memory unit 522, the associated process is not shown. 

[0136] The configuration of this embodiment is described first with 
reference to this figure. 

As shown in the example in FIG. 15, the signal separation device 
500 of this embodiment includes a memory unit 501 and a signal separation 

20 processor 502 that is electrically connected thereto by a hard-wired or 
wireless connection. 

Memory unit 501 might be, for example, a hard disk device, a 
magnetic recording device such as a flexible disk or magnetic tape device, an 
optical disk device such as a DVD-RAM (random access memory) or CD-R 

25 (recordable)/RW (rewritable) device, a magneto-optical recording device such 



75 

as an MO (magneto-optical) disc device, or a semiconductor memory such as 
an EEPROM (electronically erasable programmable read-only memory) or 
flash memory. Memory unit 501 may be situated inside the same enclosure as 
signal separation processor 502, or it may be housed separately. 
5 [0137] This signal separation processor 502 consists of hardware 

configured from elements such as a processor and RAM, for example, and it 
also incorporates a frequency domain transformation unit 5 1 1 , a mixing 
matrix estimation unit 5 12, a permutation problem resolution unit 5 13, a 
scaling problem resolution unit 514, a column selection unit 516, a matrix 

1 0 generation unit 5 1 7, a separation matrix generation unit 5 1 8, a separated 

signal generation unit 519, a time domain transformation unit 520, a control 
unit 521, and a temporary memory unit 522. Also, the mixing matrix 
estimation unit 512 of this example incorporates a clustering unit 512a, a 
representative vector calculation unit 512b, and a vector combination unit 

15 5 12c. Furthermore, clustering unit 5 12a incorporates a normalization unit 
5 12aa and a cluster generation unit 512ab. 
[0138] PROCESSING 

FIG. 16 is a flowchart illustrating the overall processing that takes 
place in signal separation device 500 in this embodiment. In the following, the 

20 processing of signal separation device 500 is described with reference to 
FIG. 15 and FIG. 16. Note that the following description relates to situations 
where signals emitted from N (N>2) signal sources are mixed together and 
observed with M sensors. 
[0139] OVERALL PROCESSING 

25 Under the control of control unit 521, signal separation device 

500 performs the following processing. 

First, the observed signal values X!(t),...,x M (t) (where t is time) 
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observed by the M sensors are read in from memory unit 501 and input to 
frequency domain transformation unit 511 (FIG 15). Frequency domain 
transformation unit 511 uses a transformation such as a short time discrete 
Fourier transform to transform these observed signal values Xi(t),...,x M (t) into 
5 frequency-domain observed signal values Xi(f,m),...,X M (f,m) (where m is the 
discrete time interval) (Step S51). These frequency-domain signal values 
Xi(f,m),...,X M (f,m) are stored in temporary memory unit 522, and are read in 
by clustering unit 512a of mixing matrix estimation unit 512. Clustering unit 
512a clusters the resulting observed signal vectors 

10 X(f ? m)=[X 1 (f,m),... 5 X M (f,m)] T into N clusters Q(f) (i=l ,...,N) for each 

frequency f (Step S52). Each cluster Q(f) is sent to representative vector 
calculation unit 512b, and representative vector calculation unit 512b 
calculates a representative vector ai(f) for each cluster Ci(f) (Step S53). Each 
representative vector aj(f) is stored in temporary memory unit 522, and is then 

1 5 sequentially read out by vector combination unit 5 1 2c which generates an 
estimated mixing matrix A(f)=[a!(f),...,a N (f)] whose columns are the 
representative vectors aj(f) (Step S54). The resulting estimated mixing matrix 
A(f) is stored in temporary memory unit 522. 

[0140] Permutation problem resolution unit 513 reads in the estimated 
20 mixing matrix A(f) from temporary memory unit 522, and resolves the 

permutation problem by sorting the columns of estimated mixing matrix A(f) 
(Step S55). In this process it is possible to employ feedback of the separated 
signal values Y 1 (f,m),...,Y N (f,m) as described below, in which case the 
permutation problem can be resolved more accurately. 
25 [0141] Next, after the scaling problem has been resolved by normalizing 
the columns of the estimated mixing matrix A(f) in scaling problem resolution 
unit 514 (Step S56), this estimated mixing matrix A(f) is used by separation 
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matrix generation unit 518 to generate a separation matrix W(f,m) (Step S57). 
The resulting separation matrix W(f,m) is stored in temporary memory unit 
522, from where it is then sent to separated signal generation unit 519, which 
reads in the frequency-domain signal values X 1 (f,m),...,X M (f 3 m) from 
5 temporary memory unit 522, and calculates the separated signal vectors 
Y(f,m)=[Y 1 (f,m),...,Y N (f,m)] by performing the calculation 
Y(f,m)=W(f,m)X(f,m) (Step S58). The calculated separated signal values 
Yi(f,m),...,Y N (f,m) are stored in temporary memory unit 522 and fed back to 
permutation problem resolution unit 513, and are also sent to time domain 

10 transformation unit 520. Time domain transformation unit 520 then 

transforms the separated signal values Yi(f,m),...,Y N (f,m) into time-domain 
signals yi(t),...,y N (t) by performing a short time inverse Fourier transform or 
the like for each subscript i (Step S59), thereby yielding the time-domain 
separated signal values yi(t). 

1 5 [0142] DETAILED DESCRIPTION OF THE PROCESSING IN 
MIXING MATRIX ESTIMATION UNIT 512 

Next, the processing performed in mixing matrix estimation unit 
512 is described in detail. Note that the following processing is applied to 
each frequency. 

20 First, clustering unit 512a collects together the observed signal 

components Xi(f,m),...,X M (f,m) of all the sensors read in from temporary 
memory unit 522, and associates them together as an observed signal vector 
X(f,m)=[Xi(f,m),...,X M (f,m)] T . Clustering unit 512a then performs clustering 
to generate N clusters Q(f) equal in number to the number of signal sources, 

25 and stores them in temporary memory unit 522 (Step S52). 

[0143] Here, a cluster is a set of observed signal vectors X(f,m), and is 
expressed using the set Tj of discrete time intervals in the form 
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Ci(f)={X(f,m)|meTi}. Also, the aim of clustering is to classify samples 
(observed signal vectors X(f,m)) in which the same signal source is 
predominant (having the main component) into the same cluster. The resulting 
N clusters Ci(f),...,C N (f) do not necessarily need to be mutually prime 
5 (Ci(f)oCj(f)=0, #j), and there may be some elements that do not belong to 
any cluster: 
FORMULA 40 

X(f,m)*UHi C i 

[0144] Next, representative vector calculation unit 5 12b reads in each 
10 cluster Q(f) from temporary memory unit 522, and calculates the mean of the 
samples X(f,m) belonging to each cluster Q(f): 
FORMULA 41 

a,(m) = Z X(f , m)e c, (f) X(f,m)/|C j( f)| 

as a representative vector ai(f) for each signal source (Step S53). 
Alternatively, the samples X(f,m) belonging to each cluster Q(f) may be 
suitably quantized and the most frequent values chosen as the representative 
vectors ai(f). 

[0145] Finally, the N representative vectors are collected together in 
vector combination unit 12c, which generates and outputs an estimated 
mixing matrix A(f)=[ai(f),...,a N (f)] which is an estimated matrix of the mixing 
matrix H(f)=[hi(f),...,h N (f)] (Step S54). The estimated mixing matrix A(f) is 
indeterminate with regard to the ordering of each vector (permutation 
indeterminacy) and indeterminate with regard to the magnitude of each vector 
(scaling indeterminacy). In other words, a representative vector ai(f) is 



15 



20 
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estimated as a vector which is h n (i)(f) multiplied by a complex number. Here, 
n is a permutation representing the permutation indeterminacy. 
[0146] DETAILED DESCRIPTION OF THE PROCESSING IN 
CLUSTERING UNIT 5 12 A 
5 Next, the processing performed in clustering unit 5 12a is 

described in more detail. 

The clustering unit 512a in this example performs clustering after 
each sample has been normalized in normalization unit 512aa so that 
clustering can be suitably achieved — i.e., so that samples (observed signal 
10 vectors X(f,m)) in which the same signal source is dominant are classified 
into the same clusters. 

[0147] Specifically, the normalization unit 5 12aa of this example 
performs the following calculation: 
FORMULA 42 



15 



sign(X j( f,m))=| 0 J axj(f,m)|=0) (35) 



and normalizes it as follows: 



JX(f,m)/sign(X j (f,m)) (| X^m) |* 0) 
X(f,m)<-j x(f)m) (| Xj (f,m)|=0) (36) 



before performing clustering. 

If it is also necessary, the normalization unit 5 12aa of this 
20 example may perform clustering after performing additional normalization as 
follows: 
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FORMULA 43 



X(f,m)/|| X(f,m)|| (|| X(f,m)|* 0) 
X(f,m) d|X(f f m)||=0) 



X(f,m) 



Here, the vector length || X(f,m) || is the norm of X(f,m), which can, for 
example, be obtained from the L 2 norm || X(f,m) || =L 2 (X(f,m)) defined as 
5 follows: 

FORMULA 44 



1/k 



L k (X(f , m)) = l |X j | k (f , m) J (38) 



[0148] Also, as the clustering method, it is possible to use a method 
described in many textbooks, such as hierarchical clustering or k-means 

10 clustering (see, e.g., Morio Onoe (trans.): "Pattern Classification," Shingijutsu 
Communications, ISBN 4-915851-24-9, chapter 10). Note that in any 
clustering method, the distance between two samples X(f,m) and X'(f,m) is 
defined as a means of measuring the proximity between samples, and 
clustering is performed so that every effort is made to include samples that are 

1 5 close to each other in the same clusters. 

[0149] For example, when the samples are normalized only by Formula 
(36) above, clustering unit 512a performs clustering using the cosine distance 
between two normalized observed signal vectors X(f,m) as a measure of the 
distance between them. The cosine distance between two samples X(f,m) and 

20 X'(f,m) is defined as follows: 

1 - X H (f,m) . X'(f,m)/ (|| X(f,m) || ■ || X'(f,m) ||) (39) 
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[0150] When the samples have been normalized according to the 
abovementioned Formula (36) and Formula (37), clustering unit 512a 
performs clustering in cluster generation unit 512ab using the L 2 norm 
|| X(f,m)- X'(f,m)|| = L 2 (X(f,m)-X'(f,m)) of the difference between two 
5 normalized observed signal vectors X(f,m)-X'(f,m), or an L k norm with any 
value of k, or the cosine distance (Formula (39)) as a measure of the distance 
between them. 

The reason why the above operations result in the representative 
vector ai(f) of each cluster Q becoming an estimate of the mixing vector h k (f) 
10 (including magnitude indeterminacy) is explained below. 

[0151] A cluster Q contains a collection of observed signal vectors 
X(f,m) in which only a certain source signal S k is predominant and the other 
source signals are close to zero. This state can be approximated as follows: 

X(f,m) = h k (f)S k (f,m) (40) 

15 Normalizing this function by Formula (36) yields the following: 

X <- X/sign(Xj) = h k S k /sign (H jk S k ) = sign (H jk )|S k |h k (41) 

This involves using the relationships sign (H jk S k ) = sign (H jk ) sign (S k ), 
l/sign(Hj k ) = sign*(Hj k ) (where •* is a complex conjugate operator), and 
S k /sign(S k ) = |S k |. Also, the parameters f and m have been omitted from these 
20 formulae. 

[0152] Normalizing by Formula (37) and applying Formula (40) yields 
the following: 



X <- X/||X|| = sign (H jk )|S k |h k /(|S k |-||h k ||) = sign*(H jk )h k /||h k || (42) 
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Here, the relationship || sign *(H jk )|S k |h k || = |S k H|h k || is used. The parameters f 
and m have also been omitted from these formulae. 

From Formula (4 1 ), it can be seen that the observed signal vectors 
X(f,m) normalized by Formula (36) will collect together on a straight line 
5 corresponding to the mixing vector h k (f) multiplied by sign*(H jk (f)). The 

position of each vector on this straight line depends on the magnitude |S k (f,m)| 
of the signal source. Also, from Formula (42), it can be seen that the observed 
signal vectors X(f,m) normalized by formula (37) will collect together at a 
single point sign*(H jk (f))h k (f)/||h k (f)|| in complex space. This shows that the 
10 representative vectors aj(f) calculated as the mean of the normalized observed 
signal vectors X(f,m) constitutes an estimate of the mixing vector h k (f) 
including the magnitude indeterminacy. 

[0153] DETAILED DESCRIPTION OF THE PROCESSING IN 
PERMUTATION PROBLEM RESOLUTION UNIT 513 

1 5 Next, the processing performed in permutation problem resolution 

unit 513 is described in detail. 

In permutation problem resolution unit 513, the columns of the 
estimated mixing matrix A(f) calculated at each frequency f are sorted so that 
the representative vectors ai(f) relating to the same signal source s k (t) are the 

20 same at all frequencies f (Step S55). Specifically, the subscripts i are 
reassigned so that the correspondence between each separated signal 
Yi(f,m),...,Y N (f,m) and each signal source becomes the same at all 
frequencies f. For this purpose it is possible to use, for example, two types of 
information based on the procedure of Non-Patent Reference 2, as in the prior 

25 art. 

[0154] The first type of information is positional information such as the 
signal source arrival directions or the like. In methods that use conventional 
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ICA, the separation matrix W is obtained by ICA, and positional information 
has been obtained from the Moore-Penrose pseudo-inverse matrix (which 
corresponds to the inverse matrix W" 1 when M=N) of this matrix. Here, the 
Moore-Penrose pseudo-inverse matrix W 4 " is regarded as an estimate of the 
5 mixing matrix A(f). Consequently, in this embodiment, unlike conventional 
methods where ICA is used, the estimated mixing matrix A(f) is itself 
. regarded as a Moore-Penrose pseudo-inverse matrix W*, and the positional 
information can be obtained directly from each column of this matrix. 
Specifically, the positional information can be obtained from the following 
10 formula, for example: 
FORMULA 45 

, angle(A tt (f)/A N (f)) 

a = cos" 1 B V J , J ±4-^ (43) 

27rfc- 1 1| - d r I 

Here, 0i is the angle between the straight line connecting sensor j and sensor j' 
and the straight line from the mid point between sensor j and sensor j' to the 
location of signal source i. Also, dj is a vector representing the position of 
sensor j. To resolve the permutation problem, the columns of the estimated 
mixing matrix A(f) are sorted so that the correspondence between each 
subscript i and each value of Gj becomes the same at each frequency, for 
example. 

[0155] The second type of information is the correlation between 
frequencies of the absolute values | Yj(f,m)| of the separated signal 
components, as used in conventional ICA methods. Specifically, the 
permutation problem is resolved by, for example, sorting the columns of ? 
estimated mixing matrix A(f) so as to maximize the correlation between the 
absolute values of the separated signal components for the same subscript i at 
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20 



25 
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different frequencies fl and f2: 
FORMULA 46 

cor(vf ,vf 2 ) = (-?(m)-vl\m)) m 

V<vf'(m)> m .V<vP(m)> n 

(Here, v[(m) =| Y^f^m) | -<| Y^f^m) |> m , and the notation represents the 
5 mean of"*" with respect to time m.) 

The separated signals used in this process are obtained by feeding 
back the outputs Y 1 (f,m),...,Y N (f,m) of separated signal generation unit 519. 
[0156] DETAILED DESCRIPTION OF THE PROCESSING IN 
SCALING PROBLEM RESOLUTION UNIT 514 
1 0 Next, the processing performed in scaling problem resolution unit 

514 is described in detail. 

Scaling problem resolution unit 514 receives the estimated mixing 
matrix A(f) from permutation problem resolution unit 513, and in order to 
resolve the indeterminacy in the magnitude of each column, it first normalizes 
1 5 each column (representative vector) ai(f) of the estimated mixing matrix A(f) 
as follows (Step S56): 

Bi(f) <- a i (f)/a ji (f) 

where ajj is the element in the j-th row of representative vector aj(f). 
Alternatively, a different j could be chosen for each representative vector aj(f), 
20 but it is important to use the same value of j for the same value of i at each 
frequency f. 
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[0157] DETAILED DESCRIPTION OF THE SEPARATED SIGNAL 
GENERATION PROCESS 

Next, the separated signal generation process is described in 

detail. 

5 In this embodiment, the procedure used to generate the separated 

signals differs depending on whether or not there is a sufficient number of 
sensors with respect to the number of signal sources. 

First, when there is a sufficient number of sensors (M>N), the 
separated signals can be generated easily. Specifically, separation matrix 

1 0 generation unit 518 receives the estimated mixing matrix A(f) from scaling 
problem resolution unit 514, and calculates the Moore-Penrose pseudo- 
inverse matrix A(f) + (equivalent to the inverse matrix A(f)" 1 when M=N) as 
the separation matrix W(f) (Step S57). The resulting separation matrix W(f) is 
stored in temporary memory unit 522. Separated signal generation unit 519 

1 5 reads this separation matrix W(f) and the observed signal vectors X(f,m) from 
temporary memory unit 522, and uses them to generate the separated signal 
components Yi(f,m),...,Y N (f,m) by performing the calculation Y(f,m) = 
W(f)X(f,m) (Step S58). 

[0158] On the other hand, when the number of sensors is insufficient 
20 (M<N), the separated signals Y(f,m) cannot be uniquely determined with 
respect to the estimated mixing matrix A(f) and observed signal vectors 
X(f,m). This is because there infinitely many values of Y(f,m) that satisfy the 
following relationship: 
FORMULA 47 



25 



X(f, m ) = A(f ) Y(f, m) = ]T " ] a s (f) Y, (f, m) 



(45) 
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However, with regard to the sparsity of the source signal, it is known that the 
most accurate separated signal components are found at solutions of Y(f,m) 
that minimize the Li norm: 
FORMULA 48 

5 L,(Y(f,m)) = Xr =1 l Y i( f ' m )l ( 46 > 

(Shun-ichi Amari: "How Can Humans and Machines Distinguish Visual and 
Auditory Signals? (Introduction)", Journal of IEICE, Vol. 87, No. 3, pp. 167, 
March 2004). When separation is performed using this sort of minimizing 
criterion, the separation matrix W(f,m) varies with time so that separation 

1 0 matrix generation unit 518 calculates a time-dependent separation matrix 

W(f,m) from the observed signal vector X(f,m) and estimated mixing matrix 
A(f) at each time interval m (Step S57), and separated signal generation unit 
519 calculates the separated signal components Yi(f,m),...,Y N (f,m) from the 
formula Y(f,m)=W(f,m)X(f,m) (Step S58). 

15 [0159] However, since the strict minimization of Li(Y(f,m)) involves a 
large computational load, the separation matrix W(f,m) is generated using an 
approximate solution in this embodiment. This solution involves sequentially 
selecting the columns (representative vectors) ai(f) of estimated mixing matrix 
A(f) that are oriented closest to the observed signal vectors X(f,m) (or to the 

20 residual vector e at a certain point in time), and repeating the process until M 
selections have been made. 

FIG. 1 7 shows a flowchart illustrating the approximate solution 
method of this embodiment. The process whereby the separation matrix 
W(f,m) is calculated using the approximate solution method of this flowchart 

25 is described below. 
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[0160] First, column selection unit 516 reads in estimated mixing matrix 
A(f) and observed signal vector X(f,m) from temporary memory unit 522 
(Step S61), initializes residual vector e with the observed signal vector 
X(f,m), assigns a value of 1 to variable k (Step S62), and stores this 
5 information in temporary memory unit 522. 

Next, column selection unit 516 checks the value of variable k in 
temporary memory unit 522, and judges whether or not k<M (Step S63). If 
k<M, column selection unit 516 selects q(k) such that 

q(k) = argmax i |a i (f) H -e|/||a i (f)|| (47) 

10 and stores the result of this selection in temporary memory unit 522 (Step 

S64). Here, Formula (47) maximizes the absolute value of the dot product of 
the residual vector e and the length-normalized column |ai(f) H |/||aj(f)|| — in 
other words, it represents an operation for selecting the representative vector 
ai(f) closest to the direction of the residual vector e. The reason for selecting 

1 5 the representative vector ai(f) closest to the direction of the residual vector e is 
that the residual vector e becomes smaller in the next iteration, and thus each 
subsequent value of Yi(f,m) becomes smaller so that ultimately it can be 
expected that the Li norm of Y(f,m) defined by Formula (46) also becomes 
smaller. 

20 [0161] Next, column selection unit 516 sets up a matrix 

Q=[a q( i)(f),...,a q(k )(f)] representing the subspace spanned by all the selected 
representative vectors a q( i)(f),...,a q(k) (f) stored in temporary memory unit 522 
(Step S65), and calculates P=Q(Q H Q) _1 Q H (Step S66). Column selection unit 
516 then updates the residual vector e with the calculation 



25 



e = X(f,m)-P-X(f,m) 
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and stores it in temporary memory unit 522 (Step S67). 
[0162] Here, P-X(f,m) is the projection of observed signal vector X(f,m) 
in subspace Q — specifically, it is represented by a linear combination of the 
representative vectors a q( i)(f),...,a q(k) (f) that have so far been selected from the 
5 observed signal vectors X(f,m). The remainder e=X(f,m)-P-X(f,m) is 
represented by the other vectors — specifically, it is represented by the 
column (representative vector) a q(i ) selected in the subsequent loop process. 

After that, in order to select the next column in turn, column 
selection unit 516 produces a new value of k by adding 1 to the variable k in 

10 temporary memory unit 522, and returns to step S63 (Step S68). Since the 
residual vector e only includes components that are orthogonal to the 
representative vectors a q(i) that have already been selected, there is no 
possibility of representative vectors that have already been selected being 
selected again on the basis of maximizing the absolute value of the dot 

15 product |ai(f) H -e|/||ai(f)|| (Step S64). 

[0163] Then, at Step S63, when column selection unit 516 judges that 
k<M (equivalent to the selection of min(M,N) representative vectors aj(f)), 
column selection unit 516 ends the loop process of Steps S64— 68. At this 
point, the M selected representative vectors a q(i ) span the full space, so the 

20 residual vector e becomes zero. When the loop process of Steps S64-68 ends, 
matrix generation unit 517 reads these M selected representative vectors a q(i) 
from temporary memory unit 522, and generates column vectors ai'(f,m) in 
which the N— M representative vectors (column vectors) ai(f) that were not 
selected in the processing of steps S63-68 are set to zero (Step S69): 



FORMULA 49 



a[(f,m) = 



a s (f) 



0 



ie{q(l),...,q(M)} 
i*{q(l),...,q(M)} 



(48) 



Furthermore, matrix generation unit 517 calculates a matrix 
A f (f 5 m)=[ai f (f ? m) 5 ...,a N f (f,m)] whose columns are the column vectors aj'(f,m) 
5 of Formula (48) (this matrix is equivalent to the matrix A'(f,m) whose 
columns are the min(M,N) selected representative vectors ai(f) and the 
max(N-M,0) zero vectors), and stores it in temporary memory unit 522 (Step 
S70). The matrix A'(f>m) calculated in this way is an NxM matrix of which 
N-M rows are zero vectors. 

1 0 [0164] Separation matrix generation unit 518 reads this matrix A'(f>m) 
from temporary memory unit 522, and generates its Moore-Penrose pseudo- 
inverse matrix A'(f,m) + as separation matrix W(f,m) (Step S71). This is 
equivalent to an N-row x M-column separation matrix W(f,m) which is the 
Moore-Penrose pseudo-inverse matrix of an M-row x N-column matrix where 

15 0 or more of the N representative vectors a { (f) have been substituted with zero 
vectors. 

[0165] The resulting separation matrix W(f,m) is stored in temporary 
memory unit 522. Separated signal generation unit 519 reads in this 
separation matrix W(f,m) and the observed signal vectors X(f,m) from 

20 temporary memory unit 522, and uses the formula Y(f,m)=W(f,m)X(f,m) to 
generate the separated signal components Y!(f,m),...,Y N (f,m), which it stores 
in temporary memory unit 52 (Step S58). Note that N-M elements of the 
separated signal components Y 1 (f,m),...,Y N (f,m) generated in this way will 
inevitably be zero. That is, by performing the processing of Steps S61-S71 

25 for a certain discrete time interval m, it is only possible to ascertain a 
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maximum of M separated signal components. Consequently, in this 
embodiment the abovementioned processes of selecting M representative 
vectors ai(f), generating matrix A'(f,m), calculating the separation matrix 
W(f,m), calculating the separated signal vectors Y(f,m), and transforming 
5 them into time-domain signal values yi(t),...,yN(t) are performed separately in 
each discrete time interval m. In this way, it is possible to ascertain all the 
separated signal components. 

[0166] ADVANTAGES OF THIS EMBODIMENT 
BLIND SOURCE SEPARATION WITH N>M 

10 As described above, with this embodiment it is possible to 

achieve blind source separation even when there is a small number of sensors 
(N>M) as long as the source signals exhibit some degree of sparsity. As a 
result, it allows the number of sensors to be reduced and can contribute to 
reducing the device costs. 

1 5 EFFECTS OF NORMALIZATION 

The graphs in FIG 1 8-FIG.23 show some examples of the effects 
of performing normalization in normalization unit 512aa. These show the 
observed signal vectors X(f,m) obtained at 2773 Hz when observing 1 or 2 
speech sources with 2 microphones in a room with a reverberation time of 

20 130 ms. Although these signals were observed with two microphones, they 
are plotted in four-dimensional space because the observed signal vectors 
X(f,m) are complex vectors in the frequency domain. Consequently, in FIG4— 
FIG9, these four dimensions are shown projected into a two-dimensional 
space. In these figures, "imag" indicates the imaginary terms of each observed 

25 signal, and "Real" indicates the real terms. Also, Xj indicates data relating to 
the observed signal observed by the first microphone, and X 2 indicates data 
relating to the observed signal observed by the second microphone. 
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[0167] First, FIG18-FIG.20 show the effects of normalization in the case 
of a single audio source. 

FIG. 1 8 shows a plot of the observed signal vectors X(f,m) before 
normalization. In this example, the vectors form a cluster around the origin, 
5 and it is not possible to obtain useful information from this cluster about 
representative vector ai(f) relating to source signal 1. On the other hand, 
FIG 19 shows a plot of the observed signal vectors X(f,m) after normalization 
by Formula (36). In this example, the samples are distributed in a specific 
direction from the origin. This direction corresponds to the representative 

10 vector ai(f) to be estimated. This provides useful information for determining 
the representative vector ai(f). Also, FIG20 shows a plot of the observed 
signal vectors X(f,m) after normalization by Formula (37). In this example, 
the vectors form a cluster at a location that is separate from the origin. The 
vector joining the center of this cluster to the origin corresponds to the 

15 representative vector ai(f) to be estimated. 

[0168] Next, FIG21-FIG.23 show the effects of normalization in the case 
of two audio sources. 

FIG.21 shows a plot of the observed signal vectors X(f,m) before 
normalization. As in the case of a single audio source, it is not possible to 

20 obtain useful information about the two audio sources in this case. FIG22 
shows a plot of the observed signal vectors X(f,m) after normalization by 
Formula (36). In this example, the samples are distributed in two directions 
from the origin. These directions correspond to the representative vectors ai(f) 
and a 2 (f) to be estimated. FIG.23 shows a plot of the observed signal vectors 

25 X(f,m) after normalization by Formula (37). In this example, the vectors form 
two clusters at locations that are separate from the origin. The vectors joining 
the centers of these clusters to the origin correspond to the representative 
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vectors aj(f) and a 2 (f) to be estimated. 

[0169] EFFECTS OF GENERATING A SEPARATION MATRIX 
USING AN APPROXIMATE SOLUTION 

As mentioned above, when minimization is strictly performed in 
5 the generation of a separation matrix W(f,m) in cases where N>M, the 
computational load becomes very heavy. For example, since there are N C M 
ways of making M selections from N representative vectors a 1 (f),...,a N (f), it 
would strictly be necessary to sort a group of N C M items in order to find the 
combination that minimizes the Lj norm (Formula (46)). However, with the 
10 approximate solution shown in FIG 17, it is possible to make do with a lower 
computational load because the number of loop iterations only needs to 
correspond to the number of sensors M. 

[0170] In this embodiment, it is assumed that different procedures are 
used to generate the separation matrix W(f,m) depending on whether or not 
15 there is a sufficient number of sensors with respect to the number of signal 
sources (i.e., whether or not N<M). However, it is also possible to use the 
same routine to generate the separation matrix W(f,m) regardless of whether 
or not there is a sufficient number of sensors with respect to the number of 
signal sources. 

20 FIG.24 shows a flowchart illustrating this example. 

In this variant, regardless of whether or not N<M, column 
selection unit 516 first reads in estimated mixing matrix A(f) and observed 
signal vector X(f,m) from temporary memory unit 522 (Step S81), initializes 
residual vector e with the observed signal vector X(f,m), and assigns a value 

25 of 1 to variable k (Step S82). Column selection unit 516 then judges whether 
or not k<min(M,N) (Step S83), and if so it selects the column a q(u) (f) that 
maximizes the expression |a q(U )(f) H -e|/||a q(u) (f)|| (where a H is the conjugate 
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transposition matrix of a) (Step S84), sets up a matrix Q=[a q( i)(f),...,a q(k) (f)] 
representing the subspace spanned by all the selected columns a q(u )(u=l,...,k) 
(Step S85), calculates P=Q(Q H Q)" 1 Q H (Step S86), updates the residual vector 
e according to the result of calculating X(f,m)-P-X(f,m) (Step S87), updates 
5 the value of variable k by adding 1 to it (Step S88), and returns to Step S83. In 
other words, the processing of Steps S83-88 is repeated min(M,N) times. 
Here, min(M,N) denotes the smaller of the two values M and N, and 
max(N-M,0) denotes the value of N-M or zero, whichever is the larger. 
[0171] After that, column selection unit 516 stores the min(M,N) 
10 representative vectors a q(i) thereby selected in temporary memory unit 522. 

Next, matrix generation unit 517 reads in these min(M,N) 
representative vectors a q(i ) from temporary memory unit 522, generates the 
column vectors ai'(f,m) as follows (Step S89): 
FORMULA 50 

fa(f) ie{q(l),...,q(min(M,N))} 
15 «.«"»>-[„ i(E < q (l),..., q (mi„(M,N))} (49) 

and generates the matrix A f (f,m)=[ai'(f,ni) v ..,a N f (f,ni)] whose columns consist 
of min(M,N) representative vectors ai(f) and max(N-M,0) zero vectors (Step 
S90). After the resulting matrix A'(f,m) has been stored in temporary memory 
unit 522, it is read in by separation matrix generation unit 518, which 

20 calculates separation matrix W(f,m) as the Moore-Penrose pseudo-inverse 
matrix A(f,m) + thereof (equivalent to the inverse matrix W" 1 when M=N) 
(Step S91). This is equivalent to an N-row x M-column separation matrix 
W(f,m) which is the Moore-Penrose pseudo-inverse matrix of an M-row x N- 
column matrix where 0 or more of the N representative vectors a x (f) have been 

25 substituted with zero vectors. 
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[0172] VARIANTS, ETC. 

The present invention is not limited to the embodiments 
mentioned above. For example, in the first through eighth embodiments, the 
extracted signals are combined after they have been returned to the time 
5 domain, but when using a binary mask it is also possible to transform the 
signals into the time domain after they have been combined in the frequency 
domain. 

FIG.25 shows a partial block diagram illustrating a configuration 
in which the signals area transformed into the time domain after they have 
1 0 been combined in the frequency domain. The configuration shown in this 
figure is a configuration that can be provided instead of limited signal 
separation unit 60-k, time domain transformation unit 70-k and signal 
combination unit 80 in FIG. 1 . 

[0173] In this example, the frequency-domain signal values Y kq nkq (f,m) 
15 output of the limited signal separation units 601-k from all systems K are 
combined in the frequency domain by signal combination unit 602, and are 
then transformed into the time domain by time domain transformation unit 
603. When there is only one separated signal Y kq nkq (f,m) having the same tag 
ai at a certain frequency f, signal combination unit 602 obtains the separated 
20 signal value from the following formula: 

Yi(f,m) = Y kq nkq (f,m) 

Also, when there are two or more separated signals Y kq nkq (f,m) having the 
same tag aj at a certain frequency f, Yi(f,m) is obtained as, for' example, the 
mean of the separated signals Y kq nkq (f,m) having the same tag a { as follows: 
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FORMULA 51 

Y i (f,m)4S nM Yj(f ) m) 

(where K is the number of separated signals having the same tag aj) 
[0174] Finally, time domain transformation unit 603 performs a short 
5 time inverse Fourier transform or the like to transform into the time domain 
yi(t) the output signal values Yi(f,m) that have been combined in the 
frequency domain. 

Also, in the first through eighth embodiments, the signal 
combination process was performed by applying a tag to each separated 
10 signal, but instead of applying a tag to each separated signal, the output 

signals could be combined by using temporary memory unit 90 to store the set 
Gk of V representative values corresponding to the signals separated in each 
system k. 

[0175] For example, when Gk does not include the same representative 
1 5 values in a plurality of systems, all the separated signals yk q (t) could be output 
as the final separated signals yi(t) (i=l,...,N). Alternatively, all the separated 
signals Y kq (f,m) in the frequency domain could be taken as the final separated 
signals Yi(f,m) (i=l,...,N) in the frequency domain and then transformed into 
time-domain signals. 
20 [0176] Also, when Gk includes K of the same representative values 

(where K>2) in a plurality of systems, the signal correlations are calculated 
for all combinations of the separated signals of system k yk q (t) (q=l,...,V k ; 
where V k is the number of elements in G k ) with the separated signals of 
system k' yk'R(t) (R==l,...,Vk), and the mean of ykq(t) and yka(t) is obtained for 
25 K elements with where there is a high correlation. This is repeated for a 
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plurality of systems that include the same representative value, thereby 
combining the signals. Alternatively, by performing the same operations on all 
the separated signals in the frequency domain, the signals could be combined 
in the frequency domain and then transformed into the time domain, 
5 [0177] Furthermore, signal separation could be performed by a system 
that combines elements of the abovementioned first through ninth 
embodiments. 

For example, the representative vectors could be obtained by the 
method of the eighth embodiment, and then the limited signals could be 
10 separated by the method of the second embodiment. For example, using the 
representative vectors obtained by representative value calculation unit 430 
(FIG. 13), Formula (18) of the second embodiment could be replaced by 
obtaining M k (f,m) and M DC (f,m) as follows: 
FORMULA 52 



15 M k (f,m) = 



\ D(X(f,m),a k (f)) < min^ D(X(f,m), aj (f)) 
0 otherwise 



(f , m) = j 1 maX Vf )e o k D < X ( f > m >' a p <*» < mi \ ( f)^ D < X ( f ' m >' a « <*» 
00 ] 0 otherwise 



(See FIG.8 for an illustration of M k (f,m) and M DC (f,m)), and then the limited 
signals could be separated by the same procedure as limited signal separation 
unit 160-k in the second embodiment. 
20 Here, instead of obtaining the abovementioned M k (f,m) and 

A 

M DC (f>m), it is also possible to determine X k (f, m) = M k X directly as 
follows: 
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FORMULA 53 

X k (f m) = J X(f ' m) D(X(f ' m) ' a * (f » < min ^J D(X(f ' m) ' a J (f )} 
k [0 otherwise 

(corresponding to the processing performed by mask generation unit 151-k 
and multiplication arithmetic unit 161-k in FIG. 8), and to generate the limited 
5 signal values as follows: 

* J X(f,m) max ^ D(X(f,m),a p (f)) < mm D(X(f,m),a q (f)) 

X k (f,m) = < ^ p . ,l J k 

[0 otherwise 

(corresponding to the processing performed by mask generation unit 151-k 
and limited signal extraction unit 152-k in FIG8). 

[0178] Alternatively, instead of generating M k (f,m) in mask generation 
10 unit 151-k (FIG.8), the representative vectors [a 1? ...,a N ] (where ai is a column 
vector) obtained by representative value calculation unit 430 (FIG 13) could 
be gathered together as H A in mixing process estimation unit 162-k (FIG8) 
and used as an estimated mixing matrix. 

Also, instead of the Fourier transforms and inverse Fourier 
1 5 transforms used to transform between the time domain and frequency domain 
in the above embodiments, other forms of transformation can be used such as 
wavelet transformation, DFT filter banks, or polyphase filter banks (see, e.g., 
R.E. Crochiere and L.R. Rabiner: "Multirate Digital Signal Processing," 
Eaglewood Cliffs, NJ: Printice-Hall, 1983 (ISBN 0-13-605162-6)). 
20 [0179] Computer implementations of the abovementioned first through 
ninth embodiments can be configured as described below. 

FIG26 shows an example of a signal separation device 610 in 
which each embodiment is configured on a computer. 
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The signal separation device 610 of this example includes a CPU 
(central processing unit) 620, RAM (random access memory) 630, ROM 
(read only memory) 640, an external memory device 650, an input unit 660, 
an interface 670, and a bus 680. 
5 [0180] CPU 620 is a central processing unit such as a RISC (reduced 
instruction set computer) or CISC (complex instruction set computer) chip 
incorporating an arithmetic logical unit 621, a control unit 622, and a register 
623. Register 623 is a fast memory such as DRAM (dynamic random access 
memory) or SRAM (static random access memory), for example. 
10 Also, RAM 630 is a rewritable semiconductor memory such as a 

DRAM, SRAM, flash memory, or NV (nonvolatile) RAM. ROM 640 is a 
read-only semiconductor memory such as an MROM (mask read-only 
memory), for example, and is used to store various programs, data and the 
like. 

15 [0181] External memory device 650 is, for example, a hard disk device, a 
magnetic recording device such as a flexible disk or magnetic tape device, an 
optical disk device such as a DVD-RAM (random access memory) or CD-R 
(recordable)/RW (rewritable) device, a magneto-optical recording device such 
as an MO (magneto-optical) disc device, or a semiconductor memory such as 

20 an EEPROM (electronically erasable programmable read-only memory) or 
flash memory. 

Input unit 660 is an input device such as a keyboard, mouse, 
joystick or the like. Interface is, for example, an input/output port that is used 
for the input and/or output of data, and can be connected to various types of 
25 device such as sensors, communication boards, and memory devices. 

[0182] Furthermore, bus 680 is configured from, e.g., a data bus, address 
bus, control bus and the like, and is electrically connected to CPU 620, RAM 
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630, ROM 640, external memory device 650, input unit 660 and interface 670 
to allow them to exchange data with each other. 

The processing algorithm performed by signal separation device 
610 is, for example, declared in the form of a signal separation program 
5 which is, for example, recorded on a recording medium that can be read by 
the computer. Examples of recording media that can be read by computer 
include any kind of magnetic recording device, optical disk, magneto-optical 
recording medium, semiconductor memory or the like; specifically, it is 
possible to use a magnetic recording device such as a hard disk device, a 

1 0 flexible disk device or a magnetic tape device, an optical disk such as a DVD 
(digital versatile disc), DVD-RAM (random access memory), CD-ROM 
(compact disc read only memory) or CD-R (recordable)/RW (rewritable), a 
magneto-optical recording medium such as an MO (magneto-optical disc), or 
a semiconductor memory such as an EEPROM (electronically erasable 

1 5 programmable read-only memory). 

[0183] The signal separation program can be circulated by such means as 
selling, transferring, or lending a portable recording medium such as a DVD 
or CD-ROM on which this program is recorded, for example. Alternatively, a 
configuration could be employed whereby this program is stored in the 

20 memory device of a server computer and circulated by transmitting the 
program across a network from the server computer to another computer. 

When processing is performed in signal separation device 610, it 
first downloads into the program region 65 1 of external memory device 650 a 
signal separation program recorded on a portable recorded medium or a signal 

25 separation program transmitted from a server computer, for example. 

[0184] Also, the time-domain observed signals xj(t) (j=l,...,M) observed 
by each sensor are pre-stored in data region 652 of external memory device 
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650. The storage of these observed signals Xj(t) may be achieved by inputting 
the observed signals Xj(t) sent from the sensors into interface 670 and storing 
them in external memory device 650 via bus 680, or by employing a 
configuration where the observed signals xj(t) are pre-stored in external 
5 memory device 650 by a separate device, and this external memory device 
650 is then connected to bus 680. 

Next, for example, under the control of control unit 622 of CPU 
620, the signal separation program is sequentially read in from the program 
region 651 of external memory device 650, and is stored in the program 

10 region 63 1 of RAM 630. The signal separation program stored in RAM 630 is 
read into CPU 620, and the control unit 622 of CPU 620 performs various 
processes according to the contents of this signal separation program, such as 
inputting and outputting data, performing computations in arithmetic logical 
unit 621, and storing data in register 623. 

1 5 [0185] When the processing by CPU 620 is started, CPU 620 reads out 
the observed signals Xj(t) from data region 652 of external memory device 
650, for example, and writes them to data region 632 of RAM 630, for 
example. CPU 620 then performs each of the abovementioned processes 
under the control of control unit 622 while sequentially reading out the signal 

20 separation program from the program region 63 1 of RAM 630 and the signal 
separation program from data region 632. Here, for example, RAM 630 or 
external memory device 650 performs the functions of memory unit 2 or 501 
in the first through ninth embodiments, and RAM 630 or register 623 
performs the functions of temporary memory unit 90 or 522 in the first 

25 through ninth embodiments. 

[0186] Also, as another embodiment of this program, CPU 620 could 
read the program directly from a portable recording medium and execute the 
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processing according to this program, or CPU 620 could sequentially execute 
processing according to programs transmitted to it from a server computer. It 
is also possible to employ a configuration where when abovementioned 
processing is performed by a so-called ASP (application service provider), 
5 whereby, instead of transmitting programs from a server computer to this 
computer, the processing functions are implemented just by issuing 
instructions and gathering results. 

[0187] Furthermore, in addition to performing the above processes in the 
temporal sequence described above, they may also be performed in parallel or 

10 individually according to requirements or to increase the performance of the 
device that executes the processing. And, needless to say, it is possible to 
make other suitable modifications without departing from the essence of the 
present invention. 
POTENTIAL INDUSTRIAL USES 

15 [0188] With this invention, it is possible to separate and extract target 
signals even, for example, in environments where various types of noise and 
interference signals are present. For example, when applied to the field of 
audio engineering, it could be used to construct a speech recognition system 
that achieves a high recognition rate by separating and extracting the target 

20 speech even in situations where the input microphones of the speech 

recognition equipment are some distance away from the speaker and pick up 
sounds other than the speech produced by the target speaker. 



